Speed Up Nested For Loops with NumPy
I'm trying to solve a dynamic programming problem, and I came up with a simple loop-based algorithm which fills in a 2D array based on a series of if
statements like this:
s = # some string of size n
opt = numpy.zeros(shape=(n, n))
for j in range(0, n):
for i in range(j, -1, -1):
if j - i == 0:
opt[i, j] = 1
elif j - i == 1:
opt[i, j] = 2 if s[i] == s[j] else 1
elif s[i] == s[j] and opt[i + 1, j - 1] == (j - 1) - (i + 1) + 1:
opt[i, j] = 2 + opt[i + 1, j - 1]
else:
opt[i, j] = max(opt[i + 1, j], opt[i, j - 1], opt[i + 1, j - 1])
Unfortunately, this code is extremely slow for large values of N. I found that it is much better to use built in functions such as numpy.where
and numpy.fill
to fill in the values of the array as opposed to for loops, but I'm struggling to find any examples which explain how these functions (or other optimized numpy
methods) can be made to work with a series of if
statements, as my algorithm does. What would be an appropriate way to rewrite the above code with built-in numpy
libraries to make it better optimized for Python?
python numpy optimization
add a comment |
I'm trying to solve a dynamic programming problem, and I came up with a simple loop-based algorithm which fills in a 2D array based on a series of if
statements like this:
s = # some string of size n
opt = numpy.zeros(shape=(n, n))
for j in range(0, n):
for i in range(j, -1, -1):
if j - i == 0:
opt[i, j] = 1
elif j - i == 1:
opt[i, j] = 2 if s[i] == s[j] else 1
elif s[i] == s[j] and opt[i + 1, j - 1] == (j - 1) - (i + 1) + 1:
opt[i, j] = 2 + opt[i + 1, j - 1]
else:
opt[i, j] = max(opt[i + 1, j], opt[i, j - 1], opt[i + 1, j - 1])
Unfortunately, this code is extremely slow for large values of N. I found that it is much better to use built in functions such as numpy.where
and numpy.fill
to fill in the values of the array as opposed to for loops, but I'm struggling to find any examples which explain how these functions (or other optimized numpy
methods) can be made to work with a series of if
statements, as my algorithm does. What would be an appropriate way to rewrite the above code with built-in numpy
libraries to make it better optimized for Python?
python numpy optimization
add a comment |
I'm trying to solve a dynamic programming problem, and I came up with a simple loop-based algorithm which fills in a 2D array based on a series of if
statements like this:
s = # some string of size n
opt = numpy.zeros(shape=(n, n))
for j in range(0, n):
for i in range(j, -1, -1):
if j - i == 0:
opt[i, j] = 1
elif j - i == 1:
opt[i, j] = 2 if s[i] == s[j] else 1
elif s[i] == s[j] and opt[i + 1, j - 1] == (j - 1) - (i + 1) + 1:
opt[i, j] = 2 + opt[i + 1, j - 1]
else:
opt[i, j] = max(opt[i + 1, j], opt[i, j - 1], opt[i + 1, j - 1])
Unfortunately, this code is extremely slow for large values of N. I found that it is much better to use built in functions such as numpy.where
and numpy.fill
to fill in the values of the array as opposed to for loops, but I'm struggling to find any examples which explain how these functions (or other optimized numpy
methods) can be made to work with a series of if
statements, as my algorithm does. What would be an appropriate way to rewrite the above code with built-in numpy
libraries to make it better optimized for Python?
python numpy optimization
I'm trying to solve a dynamic programming problem, and I came up with a simple loop-based algorithm which fills in a 2D array based on a series of if
statements like this:
s = # some string of size n
opt = numpy.zeros(shape=(n, n))
for j in range(0, n):
for i in range(j, -1, -1):
if j - i == 0:
opt[i, j] = 1
elif j - i == 1:
opt[i, j] = 2 if s[i] == s[j] else 1
elif s[i] == s[j] and opt[i + 1, j - 1] == (j - 1) - (i + 1) + 1:
opt[i, j] = 2 + opt[i + 1, j - 1]
else:
opt[i, j] = max(opt[i + 1, j], opt[i, j - 1], opt[i + 1, j - 1])
Unfortunately, this code is extremely slow for large values of N. I found that it is much better to use built in functions such as numpy.where
and numpy.fill
to fill in the values of the array as opposed to for loops, but I'm struggling to find any examples which explain how these functions (or other optimized numpy
methods) can be made to work with a series of if
statements, as my algorithm does. What would be an appropriate way to rewrite the above code with built-in numpy
libraries to make it better optimized for Python?
python numpy optimization
python numpy optimization
asked Nov 22 '18 at 0:38
user3250889user3250889
767
767
add a comment |
add a comment |
3 Answers
3
active
oldest
votes
I don't think that np.where and np.fill can solve your problem. np.where is used to return elements of a numpy array that satisfy a certain condition, but in your case, the condition is NOT on VALUES of the numpy array, but on the values from variables i and j.
For your particular question, I would recommend using Cython to optimize your code specially for larger values of N. Cython is basically an interface between Python and C. The beauty of Cython is that it allows you to keep your python syntax, but optimize it using C structures. It allows you to define types of variables in a C-like manner to speed up your computations. For example, defining i and j as integers using Cython will speed thing up quite considerably because the types of i and j are checked at every loop iteration.
Also, Cython will allow you to define classic, fast, 2D arrays using C. You can then use pointers for fast element access to this 2D array instead of using numpy arrays. In your case, opt will be that 2D array.
add a comment |
Your if statements and the left-hand sides of your assignment statements contain references to the array that you're modifying in the loop. This means that there will be no general way to translate your loop into array operations. So you're stuck with some kind of for loop.
If you instead had the simpler loop:
for j in range(0, n):
for i in range(j, -1, -1):
if j - i == 0:
opt[i, j] = 1
elif j - i == 1:
opt[i, j] = 2
elif s[i] == s[j]:
opt[i, j] = 3
else:
opt[i, j] = 4
you could construct boolean arrays (using some broadcasting) that represent your three conditions:
import numpy as np
# get arrays i and j that represent the row and column indices
i,j = np.ogrid[:n, :n]
# construct an array with the characters from s
sarr = np.fromiter(s, dtype='U1').reshape(1, -1)
cond1 = i==j # result will be a bool arr with True wherever row index equals column index
cond2 = j==i+1 # result will be a bool arr with True wherever col index equals (row index + 1)
cond3 = sarr==sarr.T # result will be a bool arr with True wherever s[i]==s[j]
You could then use numpy.select
to construct your desired opt
:
opt = np.select([cond1, cond2, cond3], [1, 2, 3], default=4)
For n=5
and s='abbca'
, this would yield:
array([[1, 2, 4, 4, 3],
[4, 1, 2, 4, 4],
[4, 3, 1, 2, 4],
[4, 4, 4, 1, 2],
[3, 4, 4, 4, 1]])
add a comment |
Here is a vectorized solution.
It creates diagonal views into the output array which allow us to do accumulation in diagonal direction.
Step-by-step explanation:
evaluate s[i] == s[j] in the diagonal view.
only keep those which are connected to the main or first sub- diagonal by a series of Trues in top right to bottom left direction
replace all Trues with 2s except the main diagonal which gets 1s instead; take the cumulative sum in bottom left to top right direction
finally, take the cumulative maximum in bottom up and left right direction
As it is not totally obvious this does the same as the loopy code I've tested on quite a few examples (using function stresstest
below) and it seems correct. And is roughly 7x faster for moderately large strings (1-100 characters).
import numpy as np
def loopy(s):
n = len(s)
opt = np.zeros(shape=(n, n), dtype=int)
for j in range(0, n):
for i in range(j, -1, -1):
if j - i == 0:
opt[i, j] = 1
elif j - i == 1:
opt[i, j] = 2 if s[i] == s[j] else 1
elif s[i] == s[j] and opt[i + 1, j - 1] == (j - 1) - (i + 1) + 1:
opt[i, j] = 2 + opt[i + 1, j - 1]
else:
opt[i, j] = max(opt[i + 1, j], opt[i, j - 1], opt[i + 1, j - 1])
return opt
def vect(s):
n = len(s)
h = (n+1) // 2
s = np.array([s, s]).view('U1').ravel()
opt = np.zeros((n+2*h-1, n+2*h-1), int)
y, x = opt.strides
hh = np.lib.stride_tricks.as_strided(opt[h-1:, h-1:], (2, h, n), (x, x-y, x+y))
p, o, c = np.ogrid[:2, :h, :n]
hh[...] = 2 * np.logical_and.accumulate(s[c+o+p] == s[c-o], axis=1)
np.einsum('ii->i', opt)[...] = 1
hh[...] = hh.cumsum(axis=1)
opt = np.maximum.accumulate(opt[-h-1:None if h == 1 else h-2:-1, h-1:-h], axis=0)[::-1]
return np.maximum.accumulate(opt, axis=1)
def stresstest(n=100):
from string import ascii_lowercase
import random
from timeit import timeit
Tv, Tl = 0, 0
for i in range(n):
s = ''.join(random.choices(ascii_lowercase[:random.randint(2, 26)], k=random.randint(1, 100)))
print(s, end=' ')
assert np.all(vect(s) == loopy(s))
Tv += timeit(lambda: vect(s), number=10)
Tl += timeit(lambda: loopy(s), number=10)
print()
print(f"total time loopy {Tl}, vect {Tv}")
Demo:
>>> stresstest(20)
caccbbdbcfbfdcacebbecffacabeddcfdededeeafaebeaeedaaedaabebfacbdd fckjhrmupcqmihlohjog dffffgalbdbhkjigladhgdjaaagelddehahbbhejkibdgjhlkbcihiejdgidljfalfhlaglcgcih eacdebdcfcdcccaacfccefbccbced agglljlhfj mvwlkedblhvwbsmvtbjpqhgbaolnceqpgkhfivtbkwgbvujskkoklgforocj jljiqlidcdolcpmbfdqbdpjjjhbklcqmnmkfckkch ohsxiviwanuafkjocpexjmdiwlcmtcbagksodasdriieikvxphksedajwrbpee mcwdxsoghnuvxglhxcxxrezcdkahpijgujqqrqaideyhepfmrgxndhyifg omhppjaenjprnd roubpjfjbiafulerejpdniniuljqpouimsfukudndgtjggtbcjbchhfcdhrgf krutrwnttvqdemuwqwidvntpvptjqmekjctvbbetrvehsgxqfsjhoivdvwonvjd adiccabdbifigeigdfaieecceciaghadiaigibehdaichfibeaggcgdciahfegefigghgebhddciaei llobdegpmebejvotsr rtnsevatjvuowmquaulfmgiwsophuvlablslbwrpnhtekmpphsenarhrptgbjvlseeqstewjgfhopqwgmcbcihljeguv gcjlfihmfjbkdmimjknamfbahiccbhnceiahbnhghnlleimmieglgbfjbnmemdgddndhinncegnmgmfmgahhhjkg nhbnfhp cyjcygpaaeotcpwfhnumcfveq snyefmeuyjhcglyluezrx hcjhejhdaejchedbce
total time loopy 0.2523909523151815, vect 0.03500175685621798
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53422382%2fspeed-up-nested-for-loops-with-numpy%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
I don't think that np.where and np.fill can solve your problem. np.where is used to return elements of a numpy array that satisfy a certain condition, but in your case, the condition is NOT on VALUES of the numpy array, but on the values from variables i and j.
For your particular question, I would recommend using Cython to optimize your code specially for larger values of N. Cython is basically an interface between Python and C. The beauty of Cython is that it allows you to keep your python syntax, but optimize it using C structures. It allows you to define types of variables in a C-like manner to speed up your computations. For example, defining i and j as integers using Cython will speed thing up quite considerably because the types of i and j are checked at every loop iteration.
Also, Cython will allow you to define classic, fast, 2D arrays using C. You can then use pointers for fast element access to this 2D array instead of using numpy arrays. In your case, opt will be that 2D array.
add a comment |
I don't think that np.where and np.fill can solve your problem. np.where is used to return elements of a numpy array that satisfy a certain condition, but in your case, the condition is NOT on VALUES of the numpy array, but on the values from variables i and j.
For your particular question, I would recommend using Cython to optimize your code specially for larger values of N. Cython is basically an interface between Python and C. The beauty of Cython is that it allows you to keep your python syntax, but optimize it using C structures. It allows you to define types of variables in a C-like manner to speed up your computations. For example, defining i and j as integers using Cython will speed thing up quite considerably because the types of i and j are checked at every loop iteration.
Also, Cython will allow you to define classic, fast, 2D arrays using C. You can then use pointers for fast element access to this 2D array instead of using numpy arrays. In your case, opt will be that 2D array.
add a comment |
I don't think that np.where and np.fill can solve your problem. np.where is used to return elements of a numpy array that satisfy a certain condition, but in your case, the condition is NOT on VALUES of the numpy array, but on the values from variables i and j.
For your particular question, I would recommend using Cython to optimize your code specially for larger values of N. Cython is basically an interface between Python and C. The beauty of Cython is that it allows you to keep your python syntax, but optimize it using C structures. It allows you to define types of variables in a C-like manner to speed up your computations. For example, defining i and j as integers using Cython will speed thing up quite considerably because the types of i and j are checked at every loop iteration.
Also, Cython will allow you to define classic, fast, 2D arrays using C. You can then use pointers for fast element access to this 2D array instead of using numpy arrays. In your case, opt will be that 2D array.
I don't think that np.where and np.fill can solve your problem. np.where is used to return elements of a numpy array that satisfy a certain condition, but in your case, the condition is NOT on VALUES of the numpy array, but on the values from variables i and j.
For your particular question, I would recommend using Cython to optimize your code specially for larger values of N. Cython is basically an interface between Python and C. The beauty of Cython is that it allows you to keep your python syntax, but optimize it using C structures. It allows you to define types of variables in a C-like manner to speed up your computations. For example, defining i and j as integers using Cython will speed thing up quite considerably because the types of i and j are checked at every loop iteration.
Also, Cython will allow you to define classic, fast, 2D arrays using C. You can then use pointers for fast element access to this 2D array instead of using numpy arrays. In your case, opt will be that 2D array.
answered Nov 22 '18 at 1:32
user10664643user10664643
286
286
add a comment |
add a comment |
Your if statements and the left-hand sides of your assignment statements contain references to the array that you're modifying in the loop. This means that there will be no general way to translate your loop into array operations. So you're stuck with some kind of for loop.
If you instead had the simpler loop:
for j in range(0, n):
for i in range(j, -1, -1):
if j - i == 0:
opt[i, j] = 1
elif j - i == 1:
opt[i, j] = 2
elif s[i] == s[j]:
opt[i, j] = 3
else:
opt[i, j] = 4
you could construct boolean arrays (using some broadcasting) that represent your three conditions:
import numpy as np
# get arrays i and j that represent the row and column indices
i,j = np.ogrid[:n, :n]
# construct an array with the characters from s
sarr = np.fromiter(s, dtype='U1').reshape(1, -1)
cond1 = i==j # result will be a bool arr with True wherever row index equals column index
cond2 = j==i+1 # result will be a bool arr with True wherever col index equals (row index + 1)
cond3 = sarr==sarr.T # result will be a bool arr with True wherever s[i]==s[j]
You could then use numpy.select
to construct your desired opt
:
opt = np.select([cond1, cond2, cond3], [1, 2, 3], default=4)
For n=5
and s='abbca'
, this would yield:
array([[1, 2, 4, 4, 3],
[4, 1, 2, 4, 4],
[4, 3, 1, 2, 4],
[4, 4, 4, 1, 2],
[3, 4, 4, 4, 1]])
add a comment |
Your if statements and the left-hand sides of your assignment statements contain references to the array that you're modifying in the loop. This means that there will be no general way to translate your loop into array operations. So you're stuck with some kind of for loop.
If you instead had the simpler loop:
for j in range(0, n):
for i in range(j, -1, -1):
if j - i == 0:
opt[i, j] = 1
elif j - i == 1:
opt[i, j] = 2
elif s[i] == s[j]:
opt[i, j] = 3
else:
opt[i, j] = 4
you could construct boolean arrays (using some broadcasting) that represent your three conditions:
import numpy as np
# get arrays i and j that represent the row and column indices
i,j = np.ogrid[:n, :n]
# construct an array with the characters from s
sarr = np.fromiter(s, dtype='U1').reshape(1, -1)
cond1 = i==j # result will be a bool arr with True wherever row index equals column index
cond2 = j==i+1 # result will be a bool arr with True wherever col index equals (row index + 1)
cond3 = sarr==sarr.T # result will be a bool arr with True wherever s[i]==s[j]
You could then use numpy.select
to construct your desired opt
:
opt = np.select([cond1, cond2, cond3], [1, 2, 3], default=4)
For n=5
and s='abbca'
, this would yield:
array([[1, 2, 4, 4, 3],
[4, 1, 2, 4, 4],
[4, 3, 1, 2, 4],
[4, 4, 4, 1, 2],
[3, 4, 4, 4, 1]])
add a comment |
Your if statements and the left-hand sides of your assignment statements contain references to the array that you're modifying in the loop. This means that there will be no general way to translate your loop into array operations. So you're stuck with some kind of for loop.
If you instead had the simpler loop:
for j in range(0, n):
for i in range(j, -1, -1):
if j - i == 0:
opt[i, j] = 1
elif j - i == 1:
opt[i, j] = 2
elif s[i] == s[j]:
opt[i, j] = 3
else:
opt[i, j] = 4
you could construct boolean arrays (using some broadcasting) that represent your three conditions:
import numpy as np
# get arrays i and j that represent the row and column indices
i,j = np.ogrid[:n, :n]
# construct an array with the characters from s
sarr = np.fromiter(s, dtype='U1').reshape(1, -1)
cond1 = i==j # result will be a bool arr with True wherever row index equals column index
cond2 = j==i+1 # result will be a bool arr with True wherever col index equals (row index + 1)
cond3 = sarr==sarr.T # result will be a bool arr with True wherever s[i]==s[j]
You could then use numpy.select
to construct your desired opt
:
opt = np.select([cond1, cond2, cond3], [1, 2, 3], default=4)
For n=5
and s='abbca'
, this would yield:
array([[1, 2, 4, 4, 3],
[4, 1, 2, 4, 4],
[4, 3, 1, 2, 4],
[4, 4, 4, 1, 2],
[3, 4, 4, 4, 1]])
Your if statements and the left-hand sides of your assignment statements contain references to the array that you're modifying in the loop. This means that there will be no general way to translate your loop into array operations. So you're stuck with some kind of for loop.
If you instead had the simpler loop:
for j in range(0, n):
for i in range(j, -1, -1):
if j - i == 0:
opt[i, j] = 1
elif j - i == 1:
opt[i, j] = 2
elif s[i] == s[j]:
opt[i, j] = 3
else:
opt[i, j] = 4
you could construct boolean arrays (using some broadcasting) that represent your three conditions:
import numpy as np
# get arrays i and j that represent the row and column indices
i,j = np.ogrid[:n, :n]
# construct an array with the characters from s
sarr = np.fromiter(s, dtype='U1').reshape(1, -1)
cond1 = i==j # result will be a bool arr with True wherever row index equals column index
cond2 = j==i+1 # result will be a bool arr with True wherever col index equals (row index + 1)
cond3 = sarr==sarr.T # result will be a bool arr with True wherever s[i]==s[j]
You could then use numpy.select
to construct your desired opt
:
opt = np.select([cond1, cond2, cond3], [1, 2, 3], default=4)
For n=5
and s='abbca'
, this would yield:
array([[1, 2, 4, 4, 3],
[4, 1, 2, 4, 4],
[4, 3, 1, 2, 4],
[4, 4, 4, 1, 2],
[3, 4, 4, 4, 1]])
edited Nov 22 '18 at 2:52
answered Nov 22 '18 at 1:31


teltel
7,39621431
7,39621431
add a comment |
add a comment |
Here is a vectorized solution.
It creates diagonal views into the output array which allow us to do accumulation in diagonal direction.
Step-by-step explanation:
evaluate s[i] == s[j] in the diagonal view.
only keep those which are connected to the main or first sub- diagonal by a series of Trues in top right to bottom left direction
replace all Trues with 2s except the main diagonal which gets 1s instead; take the cumulative sum in bottom left to top right direction
finally, take the cumulative maximum in bottom up and left right direction
As it is not totally obvious this does the same as the loopy code I've tested on quite a few examples (using function stresstest
below) and it seems correct. And is roughly 7x faster for moderately large strings (1-100 characters).
import numpy as np
def loopy(s):
n = len(s)
opt = np.zeros(shape=(n, n), dtype=int)
for j in range(0, n):
for i in range(j, -1, -1):
if j - i == 0:
opt[i, j] = 1
elif j - i == 1:
opt[i, j] = 2 if s[i] == s[j] else 1
elif s[i] == s[j] and opt[i + 1, j - 1] == (j - 1) - (i + 1) + 1:
opt[i, j] = 2 + opt[i + 1, j - 1]
else:
opt[i, j] = max(opt[i + 1, j], opt[i, j - 1], opt[i + 1, j - 1])
return opt
def vect(s):
n = len(s)
h = (n+1) // 2
s = np.array([s, s]).view('U1').ravel()
opt = np.zeros((n+2*h-1, n+2*h-1), int)
y, x = opt.strides
hh = np.lib.stride_tricks.as_strided(opt[h-1:, h-1:], (2, h, n), (x, x-y, x+y))
p, o, c = np.ogrid[:2, :h, :n]
hh[...] = 2 * np.logical_and.accumulate(s[c+o+p] == s[c-o], axis=1)
np.einsum('ii->i', opt)[...] = 1
hh[...] = hh.cumsum(axis=1)
opt = np.maximum.accumulate(opt[-h-1:None if h == 1 else h-2:-1, h-1:-h], axis=0)[::-1]
return np.maximum.accumulate(opt, axis=1)
def stresstest(n=100):
from string import ascii_lowercase
import random
from timeit import timeit
Tv, Tl = 0, 0
for i in range(n):
s = ''.join(random.choices(ascii_lowercase[:random.randint(2, 26)], k=random.randint(1, 100)))
print(s, end=' ')
assert np.all(vect(s) == loopy(s))
Tv += timeit(lambda: vect(s), number=10)
Tl += timeit(lambda: loopy(s), number=10)
print()
print(f"total time loopy {Tl}, vect {Tv}")
Demo:
>>> stresstest(20)
caccbbdbcfbfdcacebbecffacabeddcfdededeeafaebeaeedaaedaabebfacbdd fckjhrmupcqmihlohjog dffffgalbdbhkjigladhgdjaaagelddehahbbhejkibdgjhlkbcihiejdgidljfalfhlaglcgcih eacdebdcfcdcccaacfccefbccbced agglljlhfj mvwlkedblhvwbsmvtbjpqhgbaolnceqpgkhfivtbkwgbvujskkoklgforocj jljiqlidcdolcpmbfdqbdpjjjhbklcqmnmkfckkch ohsxiviwanuafkjocpexjmdiwlcmtcbagksodasdriieikvxphksedajwrbpee mcwdxsoghnuvxglhxcxxrezcdkahpijgujqqrqaideyhepfmrgxndhyifg omhppjaenjprnd roubpjfjbiafulerejpdniniuljqpouimsfukudndgtjggtbcjbchhfcdhrgf krutrwnttvqdemuwqwidvntpvptjqmekjctvbbetrvehsgxqfsjhoivdvwonvjd adiccabdbifigeigdfaieecceciaghadiaigibehdaichfibeaggcgdciahfegefigghgebhddciaei llobdegpmebejvotsr rtnsevatjvuowmquaulfmgiwsophuvlablslbwrpnhtekmpphsenarhrptgbjvlseeqstewjgfhopqwgmcbcihljeguv gcjlfihmfjbkdmimjknamfbahiccbhnceiahbnhghnlleimmieglgbfjbnmemdgddndhinncegnmgmfmgahhhjkg nhbnfhp cyjcygpaaeotcpwfhnumcfveq snyefmeuyjhcglyluezrx hcjhejhdaejchedbce
total time loopy 0.2523909523151815, vect 0.03500175685621798
add a comment |
Here is a vectorized solution.
It creates diagonal views into the output array which allow us to do accumulation in diagonal direction.
Step-by-step explanation:
evaluate s[i] == s[j] in the diagonal view.
only keep those which are connected to the main or first sub- diagonal by a series of Trues in top right to bottom left direction
replace all Trues with 2s except the main diagonal which gets 1s instead; take the cumulative sum in bottom left to top right direction
finally, take the cumulative maximum in bottom up and left right direction
As it is not totally obvious this does the same as the loopy code I've tested on quite a few examples (using function stresstest
below) and it seems correct. And is roughly 7x faster for moderately large strings (1-100 characters).
import numpy as np
def loopy(s):
n = len(s)
opt = np.zeros(shape=(n, n), dtype=int)
for j in range(0, n):
for i in range(j, -1, -1):
if j - i == 0:
opt[i, j] = 1
elif j - i == 1:
opt[i, j] = 2 if s[i] == s[j] else 1
elif s[i] == s[j] and opt[i + 1, j - 1] == (j - 1) - (i + 1) + 1:
opt[i, j] = 2 + opt[i + 1, j - 1]
else:
opt[i, j] = max(opt[i + 1, j], opt[i, j - 1], opt[i + 1, j - 1])
return opt
def vect(s):
n = len(s)
h = (n+1) // 2
s = np.array([s, s]).view('U1').ravel()
opt = np.zeros((n+2*h-1, n+2*h-1), int)
y, x = opt.strides
hh = np.lib.stride_tricks.as_strided(opt[h-1:, h-1:], (2, h, n), (x, x-y, x+y))
p, o, c = np.ogrid[:2, :h, :n]
hh[...] = 2 * np.logical_and.accumulate(s[c+o+p] == s[c-o], axis=1)
np.einsum('ii->i', opt)[...] = 1
hh[...] = hh.cumsum(axis=1)
opt = np.maximum.accumulate(opt[-h-1:None if h == 1 else h-2:-1, h-1:-h], axis=0)[::-1]
return np.maximum.accumulate(opt, axis=1)
def stresstest(n=100):
from string import ascii_lowercase
import random
from timeit import timeit
Tv, Tl = 0, 0
for i in range(n):
s = ''.join(random.choices(ascii_lowercase[:random.randint(2, 26)], k=random.randint(1, 100)))
print(s, end=' ')
assert np.all(vect(s) == loopy(s))
Tv += timeit(lambda: vect(s), number=10)
Tl += timeit(lambda: loopy(s), number=10)
print()
print(f"total time loopy {Tl}, vect {Tv}")
Demo:
>>> stresstest(20)
caccbbdbcfbfdcacebbecffacabeddcfdededeeafaebeaeedaaedaabebfacbdd fckjhrmupcqmihlohjog dffffgalbdbhkjigladhgdjaaagelddehahbbhejkibdgjhlkbcihiejdgidljfalfhlaglcgcih eacdebdcfcdcccaacfccefbccbced agglljlhfj mvwlkedblhvwbsmvtbjpqhgbaolnceqpgkhfivtbkwgbvujskkoklgforocj jljiqlidcdolcpmbfdqbdpjjjhbklcqmnmkfckkch ohsxiviwanuafkjocpexjmdiwlcmtcbagksodasdriieikvxphksedajwrbpee mcwdxsoghnuvxglhxcxxrezcdkahpijgujqqrqaideyhepfmrgxndhyifg omhppjaenjprnd roubpjfjbiafulerejpdniniuljqpouimsfukudndgtjggtbcjbchhfcdhrgf krutrwnttvqdemuwqwidvntpvptjqmekjctvbbetrvehsgxqfsjhoivdvwonvjd adiccabdbifigeigdfaieecceciaghadiaigibehdaichfibeaggcgdciahfegefigghgebhddciaei llobdegpmebejvotsr rtnsevatjvuowmquaulfmgiwsophuvlablslbwrpnhtekmpphsenarhrptgbjvlseeqstewjgfhopqwgmcbcihljeguv gcjlfihmfjbkdmimjknamfbahiccbhnceiahbnhghnlleimmieglgbfjbnmemdgddndhinncegnmgmfmgahhhjkg nhbnfhp cyjcygpaaeotcpwfhnumcfveq snyefmeuyjhcglyluezrx hcjhejhdaejchedbce
total time loopy 0.2523909523151815, vect 0.03500175685621798
add a comment |
Here is a vectorized solution.
It creates diagonal views into the output array which allow us to do accumulation in diagonal direction.
Step-by-step explanation:
evaluate s[i] == s[j] in the diagonal view.
only keep those which are connected to the main or first sub- diagonal by a series of Trues in top right to bottom left direction
replace all Trues with 2s except the main diagonal which gets 1s instead; take the cumulative sum in bottom left to top right direction
finally, take the cumulative maximum in bottom up and left right direction
As it is not totally obvious this does the same as the loopy code I've tested on quite a few examples (using function stresstest
below) and it seems correct. And is roughly 7x faster for moderately large strings (1-100 characters).
import numpy as np
def loopy(s):
n = len(s)
opt = np.zeros(shape=(n, n), dtype=int)
for j in range(0, n):
for i in range(j, -1, -1):
if j - i == 0:
opt[i, j] = 1
elif j - i == 1:
opt[i, j] = 2 if s[i] == s[j] else 1
elif s[i] == s[j] and opt[i + 1, j - 1] == (j - 1) - (i + 1) + 1:
opt[i, j] = 2 + opt[i + 1, j - 1]
else:
opt[i, j] = max(opt[i + 1, j], opt[i, j - 1], opt[i + 1, j - 1])
return opt
def vect(s):
n = len(s)
h = (n+1) // 2
s = np.array([s, s]).view('U1').ravel()
opt = np.zeros((n+2*h-1, n+2*h-1), int)
y, x = opt.strides
hh = np.lib.stride_tricks.as_strided(opt[h-1:, h-1:], (2, h, n), (x, x-y, x+y))
p, o, c = np.ogrid[:2, :h, :n]
hh[...] = 2 * np.logical_and.accumulate(s[c+o+p] == s[c-o], axis=1)
np.einsum('ii->i', opt)[...] = 1
hh[...] = hh.cumsum(axis=1)
opt = np.maximum.accumulate(opt[-h-1:None if h == 1 else h-2:-1, h-1:-h], axis=0)[::-1]
return np.maximum.accumulate(opt, axis=1)
def stresstest(n=100):
from string import ascii_lowercase
import random
from timeit import timeit
Tv, Tl = 0, 0
for i in range(n):
s = ''.join(random.choices(ascii_lowercase[:random.randint(2, 26)], k=random.randint(1, 100)))
print(s, end=' ')
assert np.all(vect(s) == loopy(s))
Tv += timeit(lambda: vect(s), number=10)
Tl += timeit(lambda: loopy(s), number=10)
print()
print(f"total time loopy {Tl}, vect {Tv}")
Demo:
>>> stresstest(20)
caccbbdbcfbfdcacebbecffacabeddcfdededeeafaebeaeedaaedaabebfacbdd fckjhrmupcqmihlohjog dffffgalbdbhkjigladhgdjaaagelddehahbbhejkibdgjhlkbcihiejdgidljfalfhlaglcgcih eacdebdcfcdcccaacfccefbccbced agglljlhfj mvwlkedblhvwbsmvtbjpqhgbaolnceqpgkhfivtbkwgbvujskkoklgforocj jljiqlidcdolcpmbfdqbdpjjjhbklcqmnmkfckkch ohsxiviwanuafkjocpexjmdiwlcmtcbagksodasdriieikvxphksedajwrbpee mcwdxsoghnuvxglhxcxxrezcdkahpijgujqqrqaideyhepfmrgxndhyifg omhppjaenjprnd roubpjfjbiafulerejpdniniuljqpouimsfukudndgtjggtbcjbchhfcdhrgf krutrwnttvqdemuwqwidvntpvptjqmekjctvbbetrvehsgxqfsjhoivdvwonvjd adiccabdbifigeigdfaieecceciaghadiaigibehdaichfibeaggcgdciahfegefigghgebhddciaei llobdegpmebejvotsr rtnsevatjvuowmquaulfmgiwsophuvlablslbwrpnhtekmpphsenarhrptgbjvlseeqstewjgfhopqwgmcbcihljeguv gcjlfihmfjbkdmimjknamfbahiccbhnceiahbnhghnlleimmieglgbfjbnmemdgddndhinncegnmgmfmgahhhjkg nhbnfhp cyjcygpaaeotcpwfhnumcfveq snyefmeuyjhcglyluezrx hcjhejhdaejchedbce
total time loopy 0.2523909523151815, vect 0.03500175685621798
Here is a vectorized solution.
It creates diagonal views into the output array which allow us to do accumulation in diagonal direction.
Step-by-step explanation:
evaluate s[i] == s[j] in the diagonal view.
only keep those which are connected to the main or first sub- diagonal by a series of Trues in top right to bottom left direction
replace all Trues with 2s except the main diagonal which gets 1s instead; take the cumulative sum in bottom left to top right direction
finally, take the cumulative maximum in bottom up and left right direction
As it is not totally obvious this does the same as the loopy code I've tested on quite a few examples (using function stresstest
below) and it seems correct. And is roughly 7x faster for moderately large strings (1-100 characters).
import numpy as np
def loopy(s):
n = len(s)
opt = np.zeros(shape=(n, n), dtype=int)
for j in range(0, n):
for i in range(j, -1, -1):
if j - i == 0:
opt[i, j] = 1
elif j - i == 1:
opt[i, j] = 2 if s[i] == s[j] else 1
elif s[i] == s[j] and opt[i + 1, j - 1] == (j - 1) - (i + 1) + 1:
opt[i, j] = 2 + opt[i + 1, j - 1]
else:
opt[i, j] = max(opt[i + 1, j], opt[i, j - 1], opt[i + 1, j - 1])
return opt
def vect(s):
n = len(s)
h = (n+1) // 2
s = np.array([s, s]).view('U1').ravel()
opt = np.zeros((n+2*h-1, n+2*h-1), int)
y, x = opt.strides
hh = np.lib.stride_tricks.as_strided(opt[h-1:, h-1:], (2, h, n), (x, x-y, x+y))
p, o, c = np.ogrid[:2, :h, :n]
hh[...] = 2 * np.logical_and.accumulate(s[c+o+p] == s[c-o], axis=1)
np.einsum('ii->i', opt)[...] = 1
hh[...] = hh.cumsum(axis=1)
opt = np.maximum.accumulate(opt[-h-1:None if h == 1 else h-2:-1, h-1:-h], axis=0)[::-1]
return np.maximum.accumulate(opt, axis=1)
def stresstest(n=100):
from string import ascii_lowercase
import random
from timeit import timeit
Tv, Tl = 0, 0
for i in range(n):
s = ''.join(random.choices(ascii_lowercase[:random.randint(2, 26)], k=random.randint(1, 100)))
print(s, end=' ')
assert np.all(vect(s) == loopy(s))
Tv += timeit(lambda: vect(s), number=10)
Tl += timeit(lambda: loopy(s), number=10)
print()
print(f"total time loopy {Tl}, vect {Tv}")
Demo:
>>> stresstest(20)
caccbbdbcfbfdcacebbecffacabeddcfdededeeafaebeaeedaaedaabebfacbdd fckjhrmupcqmihlohjog dffffgalbdbhkjigladhgdjaaagelddehahbbhejkibdgjhlkbcihiejdgidljfalfhlaglcgcih eacdebdcfcdcccaacfccefbccbced agglljlhfj mvwlkedblhvwbsmvtbjpqhgbaolnceqpgkhfivtbkwgbvujskkoklgforocj jljiqlidcdolcpmbfdqbdpjjjhbklcqmnmkfckkch ohsxiviwanuafkjocpexjmdiwlcmtcbagksodasdriieikvxphksedajwrbpee mcwdxsoghnuvxglhxcxxrezcdkahpijgujqqrqaideyhepfmrgxndhyifg omhppjaenjprnd roubpjfjbiafulerejpdniniuljqpouimsfukudndgtjggtbcjbchhfcdhrgf krutrwnttvqdemuwqwidvntpvptjqmekjctvbbetrvehsgxqfsjhoivdvwonvjd adiccabdbifigeigdfaieecceciaghadiaigibehdaichfibeaggcgdciahfegefigghgebhddciaei llobdegpmebejvotsr rtnsevatjvuowmquaulfmgiwsophuvlablslbwrpnhtekmpphsenarhrptgbjvlseeqstewjgfhopqwgmcbcihljeguv gcjlfihmfjbkdmimjknamfbahiccbhnceiahbnhghnlleimmieglgbfjbnmemdgddndhinncegnmgmfmgahhhjkg nhbnfhp cyjcygpaaeotcpwfhnumcfveq snyefmeuyjhcglyluezrx hcjhejhdaejchedbce
total time loopy 0.2523909523151815, vect 0.03500175685621798
answered Nov 22 '18 at 2:56
Paul PanzerPaul Panzer
29.8k21240
29.8k21240
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53422382%2fspeed-up-nested-for-loops-with-numpy%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown