Speed Up Nested For Loops with NumPy

I'm trying to solve a dynamic programming problem, and I came up with a simple loop-based algorithm which fills in a 2D array based on a series of if statements like this:

s = # some string of size n

opt = numpy.zeros(shape=(n, n))



for j in range(0, n):

    for i in range(j, -1, -1):

        if j - i == 0:

            opt[i, j] = 1

        elif j - i == 1:

            opt[i, j] = 2 if s[i] == s[j] else 1

        elif s[i] == s[j] and opt[i + 1, j - 1] == (j - 1) - (i + 1) + 1:

            opt[i, j] = 2 + opt[i + 1, j - 1]

        else:

            opt[i, j] = max(opt[i + 1, j], opt[i, j - 1], opt[i + 1, j - 1])

Unfortunately, this code is extremely slow for large values of N. I found that it is much better to use built in functions such as numpy.where and numpy.fill to fill in the values of the array as opposed to for loops, but I'm struggling to find any examples which explain how these functions (or other optimized numpy methods) can be made to work with a series of if statements, as my algorithm does. What would be an appropriate way to rewrite the above code with built-in numpy libraries to make it better optimized for Python?

asked Nov 22 '18 at 0:38

user3250889

767

add a comment |

I'm trying to solve a dynamic programming problem, and I came up with a simple loop-based algorithm which fills in a 2D array based on a series of if statements like this:

s = # some string of size n

opt = numpy.zeros(shape=(n, n))



for j in range(0, n):

    for i in range(j, -1, -1):

        if j - i == 0:

            opt[i, j] = 1

        elif j - i == 1:

            opt[i, j] = 2 if s[i] == s[j] else 1

        elif s[i] == s[j] and opt[i + 1, j - 1] == (j - 1) - (i + 1) + 1:

            opt[i, j] = 2 + opt[i + 1, j - 1]

        else:

            opt[i, j] = max(opt[i + 1, j], opt[i, j - 1], opt[i + 1, j - 1])

asked Nov 22 '18 at 0:38

user3250889

767

add a comment |

I'm trying to solve a dynamic programming problem, and I came up with a simple loop-based algorithm which fills in a 2D array based on a series of if statements like this:

s = # some string of size n

opt = numpy.zeros(shape=(n, n))



for j in range(0, n):

    for i in range(j, -1, -1):

        if j - i == 0:

            opt[i, j] = 1

        elif j - i == 1:

            opt[i, j] = 2 if s[i] == s[j] else 1

        elif s[i] == s[j] and opt[i + 1, j - 1] == (j - 1) - (i + 1) + 1:

            opt[i, j] = 2 + opt[i + 1, j - 1]

        else:

            opt[i, j] = max(opt[i + 1, j], opt[i, j - 1], opt[i + 1, j - 1])

asked Nov 22 '18 at 0:38

user3250889

767

I'm trying to solve a dynamic programming problem, and I came up with a simple loop-based algorithm which fills in a 2D array based on a series of if statements like this:

s = # some string of size n

opt = numpy.zeros(shape=(n, n))



for j in range(0, n):

    for i in range(j, -1, -1):

        if j - i == 0:

            opt[i, j] = 1

        elif j - i == 1:

            opt[i, j] = 2 if s[i] == s[j] else 1

        elif s[i] == s[j] and opt[i + 1, j - 1] == (j - 1) - (i + 1) + 1:

            opt[i, j] = 2 + opt[i + 1, j - 1]

        else:

            opt[i, j] = max(opt[i + 1, j], opt[i, j - 1], opt[i + 1, j - 1])

python numpy optimization

asked Nov 22 '18 at 0:38

user3250889

767

asked Nov 22 '18 at 0:38

user3250889

767

asked Nov 22 '18 at 0:38

user3250889

767

asked Nov 22 '18 at 0:38

user3250889

767

asked Nov 22 '18 at 0:38

user3250889

767

add a comment |

3 Answers
3

active

oldest

votes

I don't think that np.where and np.fill can solve your problem. np.where is used to return elements of a numpy array that satisfy a certain condition, but in your case, the condition is NOT on VALUES of the numpy array, but on the values from variables i and j.

For your particular question, I would recommend using Cython to optimize your code specially for larger values of N. Cython is basically an interface between Python and C. The beauty of Cython is that it allows you to keep your python syntax, but optimize it using C structures. It allows you to define types of variables in a C-like manner to speed up your computations. For example, defining i and j as integers using Cython will speed thing up quite considerably because the types of i and j are checked at every loop iteration.

Also, Cython will allow you to define classic, fast, 2D arrays using C. You can then use pointers for fast element access to this 2D array instead of using numpy arrays. In your case, opt will be that 2D array.

answered Nov 22 '18 at 1:32

user10664643

286

add a comment |

Your if statements and the left-hand sides of your assignment statements contain references to the array that you're modifying in the loop. This means that there will be no general way to translate your loop into array operations. So you're stuck with some kind of for loop.

If you instead had the simpler loop:

for j in range(0, n):

    for i in range(j, -1, -1):

        if j - i == 0:

            opt[i, j] = 1

        elif j - i == 1:

            opt[i, j] = 2

        elif s[i] == s[j]:

            opt[i, j] = 3

        else:

            opt[i, j] = 4

you could construct boolean arrays (using some broadcasting) that represent your three conditions:

import numpy as np



# get arrays i and j that represent the row and column indices

i,j = np.ogrid[:n, :n]

# construct an array with the characters from s

sarr = np.fromiter(s, dtype='U1').reshape(1, -1)



cond1 = i==j             # result will be a bool arr with True wherever row index equals column index

cond2 = j==i+1           # result will be a bool arr with True wherever col index equals (row index + 1)

cond3 = sarr==sarr.T     # result will be a bool arr with True wherever s[i]==s[j]

You could then use numpy.select to construct your desired opt:

opt = np.select([cond1, cond2, cond3], [1, 2, 3], default=4)

For n=5 and s='abbca', this would yield:

array([[1, 2, 4, 4, 3],

       [4, 1, 2, 4, 4],

       [4, 3, 1, 2, 4],

       [4, 4, 4, 1, 2],

       [3, 4, 4, 4, 1]])

edited Nov 22 '18 at 2:52

answered Nov 22 '18 at 1:31

tel

7,39621431

add a comment |

Here is a vectorized solution.

It creates diagonal views into the output array which allow us to do accumulation in diagonal direction.

Step-by-step explanation:

evaluate s[i] == s[j] in the diagonal view.

only keep those which are connected to the main or first sub- diagonal by a series of Trues in top right to bottom left direction

replace all Trues with 2s except the main diagonal which gets 1s instead; take the cumulative sum in bottom left to top right direction

finally, take the cumulative maximum in bottom up and left right direction

As it is not totally obvious this does the same as the loopy code I've tested on quite a few examples (using function stresstest below) and it seems correct. And is roughly 7x faster for moderately large strings (1-100 characters).

import numpy as np



def loopy(s):

    n = len(s)

    opt = np.zeros(shape=(n, n), dtype=int)

    for j in range(0, n):

        for i in range(j, -1, -1):

            if j - i == 0:

                opt[i, j] = 1

            elif j - i == 1:

                opt[i, j] = 2 if s[i] == s[j] else 1

            elif s[i] == s[j] and opt[i + 1, j - 1] == (j - 1) - (i + 1) + 1:

                opt[i, j] = 2 + opt[i + 1, j - 1]

            else:

                opt[i, j] = max(opt[i + 1, j], opt[i, j - 1], opt[i + 1, j - 1])

    return opt



def vect(s):

    n = len(s)

    h = (n+1) // 2

    s = np.array([s, s]).view('U1').ravel()

    opt = np.zeros((n+2*h-1, n+2*h-1), int)

    y, x = opt.strides

    hh = np.lib.stride_tricks.as_strided(opt[h-1:, h-1:], (2, h, n), (x, x-y, x+y))

    p, o, c = np.ogrid[:2, :h, :n]

    hh[...] = 2 * np.logical_and.accumulate(s[c+o+p] == s[c-o], axis=1)

    np.einsum('ii->i', opt)[...] = 1

    hh[...] = hh.cumsum(axis=1)

    opt = np.maximum.accumulate(opt[-h-1:None if h == 1 else h-2:-1, h-1:-h], axis=0)[::-1]

    return np.maximum.accumulate(opt, axis=1)



def stresstest(n=100):

    from string import ascii_lowercase

    import random

    from timeit import timeit

    Tv, Tl = 0, 0

    for i in range(n):

        s = ''.join(random.choices(ascii_lowercase[:random.randint(2, 26)], k=random.randint(1, 100)))

        print(s, end=' ')

        assert np.all(vect(s) == loopy(s))

        Tv += timeit(lambda: vect(s), number=10)

        Tl += timeit(lambda: loopy(s), number=10)

    print()

    print(f"total time loopy {Tl}, vect {Tv}")

Demo:

>>> stresstest(20)

caccbbdbcfbfdcacebbecffacabeddcfdededeeafaebeaeedaaedaabebfacbdd fckjhrmupcqmihlohjog dffffgalbdbhkjigladhgdjaaagelddehahbbhejkibdgjhlkbcihiejdgidljfalfhlaglcgcih eacdebdcfcdcccaacfccefbccbced agglljlhfj mvwlkedblhvwbsmvtbjpqhgbaolnceqpgkhfivtbkwgbvujskkoklgforocj jljiqlidcdolcpmbfdqbdpjjjhbklcqmnmkfckkch ohsxiviwanuafkjocpexjmdiwlcmtcbagksodasdriieikvxphksedajwrbpee mcwdxsoghnuvxglhxcxxrezcdkahpijgujqqrqaideyhepfmrgxndhyifg omhppjaenjprnd roubpjfjbiafulerejpdniniuljqpouimsfukudndgtjggtbcjbchhfcdhrgf krutrwnttvqdemuwqwidvntpvptjqmekjctvbbetrvehsgxqfsjhoivdvwonvjd adiccabdbifigeigdfaieecceciaghadiaigibehdaichfibeaggcgdciahfegefigghgebhddciaei llobdegpmebejvotsr rtnsevatjvuowmquaulfmgiwsophuvlablslbwrpnhtekmpphsenarhrptgbjvlseeqstewjgfhopqwgmcbcihljeguv gcjlfihmfjbkdmimjknamfbahiccbhnceiahbnhghnlleimmieglgbfjbnmemdgddndhinncegnmgmfmgahhhjkg nhbnfhp cyjcygpaaeotcpwfhnumcfveq snyefmeuyjhcglyluezrx hcjhejhdaejchedbce 

total time loopy 0.2523909523151815, vect 0.03500175685621798

answered Nov 22 '18 at 2:56

Paul Panzer

29.8k21240

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53422382%2fspeed-up-nested-for-loops-with-numpy%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

answered Nov 22 '18 at 1:32

user10664643

286

add a comment |

answered Nov 22 '18 at 1:32

user10664643

286

add a comment |

answered Nov 22 '18 at 1:32

user10664643

286

answered Nov 22 '18 at 1:32

user10664643

286

answered Nov 22 '18 at 1:32

user10664643

286

answered Nov 22 '18 at 1:32

user10664643

286

answered Nov 22 '18 at 1:32

user10664643

286

add a comment |

If you instead had the simpler loop:

for j in range(0, n):

    for i in range(j, -1, -1):

        if j - i == 0:

            opt[i, j] = 1

        elif j - i == 1:

            opt[i, j] = 2

        elif s[i] == s[j]:

            opt[i, j] = 3

        else:

            opt[i, j] = 4

you could construct boolean arrays (using some broadcasting) that represent your three conditions:

import numpy as np



# get arrays i and j that represent the row and column indices

i,j = np.ogrid[:n, :n]

# construct an array with the characters from s

sarr = np.fromiter(s, dtype='U1').reshape(1, -1)



cond1 = i==j             # result will be a bool arr with True wherever row index equals column index

cond2 = j==i+1           # result will be a bool arr with True wherever col index equals (row index + 1)

cond3 = sarr==sarr.T     # result will be a bool arr with True wherever s[i]==s[j]

You could then use numpy.select to construct your desired opt:

opt = np.select([cond1, cond2, cond3], [1, 2, 3], default=4)

For n=5 and s='abbca', this would yield:

array([[1, 2, 4, 4, 3],

       [4, 1, 2, 4, 4],

       [4, 3, 1, 2, 4],

       [4, 4, 4, 1, 2],

       [3, 4, 4, 4, 1]])

edited Nov 22 '18 at 2:52

answered Nov 22 '18 at 1:31

tel

7,39621431

add a comment |

If you instead had the simpler loop:

for j in range(0, n):

    for i in range(j, -1, -1):

        if j - i == 0:

            opt[i, j] = 1

        elif j - i == 1:

            opt[i, j] = 2

        elif s[i] == s[j]:

            opt[i, j] = 3

        else:

            opt[i, j] = 4

you could construct boolean arrays (using some broadcasting) that represent your three conditions:

import numpy as np



# get arrays i and j that represent the row and column indices

i,j = np.ogrid[:n, :n]

# construct an array with the characters from s

sarr = np.fromiter(s, dtype='U1').reshape(1, -1)



cond1 = i==j             # result will be a bool arr with True wherever row index equals column index

cond2 = j==i+1           # result will be a bool arr with True wherever col index equals (row index + 1)

cond3 = sarr==sarr.T     # result will be a bool arr with True wherever s[i]==s[j]

You could then use numpy.select to construct your desired opt:

opt = np.select([cond1, cond2, cond3], [1, 2, 3], default=4)

For n=5 and s='abbca', this would yield:

array([[1, 2, 4, 4, 3],

       [4, 1, 2, 4, 4],

       [4, 3, 1, 2, 4],

       [4, 4, 4, 1, 2],

       [3, 4, 4, 4, 1]])

edited Nov 22 '18 at 2:52

answered Nov 22 '18 at 1:31

tel

7,39621431

add a comment |

If you instead had the simpler loop:

for j in range(0, n):

    for i in range(j, -1, -1):

        if j - i == 0:

            opt[i, j] = 1

        elif j - i == 1:

            opt[i, j] = 2

        elif s[i] == s[j]:

            opt[i, j] = 3

        else:

            opt[i, j] = 4

you could construct boolean arrays (using some broadcasting) that represent your three conditions:

import numpy as np



# get arrays i and j that represent the row and column indices

i,j = np.ogrid[:n, :n]

# construct an array with the characters from s

sarr = np.fromiter(s, dtype='U1').reshape(1, -1)



cond1 = i==j             # result will be a bool arr with True wherever row index equals column index

cond2 = j==i+1           # result will be a bool arr with True wherever col index equals (row index + 1)

cond3 = sarr==sarr.T     # result will be a bool arr with True wherever s[i]==s[j]

You could then use numpy.select to construct your desired opt:

opt = np.select([cond1, cond2, cond3], [1, 2, 3], default=4)

For n=5 and s='abbca', this would yield:

array([[1, 2, 4, 4, 3],

       [4, 1, 2, 4, 4],

       [4, 3, 1, 2, 4],

       [4, 4, 4, 1, 2],

       [3, 4, 4, 4, 1]])

edited Nov 22 '18 at 2:52

answered Nov 22 '18 at 1:31

tel

7,39621431

If you instead had the simpler loop:

for j in range(0, n):

    for i in range(j, -1, -1):

        if j - i == 0:

            opt[i, j] = 1

        elif j - i == 1:

            opt[i, j] = 2

        elif s[i] == s[j]:

            opt[i, j] = 3

        else:

            opt[i, j] = 4

you could construct boolean arrays (using some broadcasting) that represent your three conditions:

import numpy as np



# get arrays i and j that represent the row and column indices

i,j = np.ogrid[:n, :n]

# construct an array with the characters from s

sarr = np.fromiter(s, dtype='U1').reshape(1, -1)



cond1 = i==j             # result will be a bool arr with True wherever row index equals column index

cond2 = j==i+1           # result will be a bool arr with True wherever col index equals (row index + 1)

cond3 = sarr==sarr.T     # result will be a bool arr with True wherever s[i]==s[j]

You could then use numpy.select to construct your desired opt:

opt = np.select([cond1, cond2, cond3], [1, 2, 3], default=4)

For n=5 and s='abbca', this would yield:

array([[1, 2, 4, 4, 3],

       [4, 1, 2, 4, 4],

       [4, 3, 1, 2, 4],

       [4, 4, 4, 1, 2],

       [3, 4, 4, 4, 1]])

edited Nov 22 '18 at 2:52

answered Nov 22 '18 at 1:31

tel

7,39621431

edited Nov 22 '18 at 2:52

answered Nov 22 '18 at 1:31

tel

7,39621431

answered Nov 22 '18 at 1:31

tel

7,39621431

answered Nov 22 '18 at 1:31

tel

7,39621431

add a comment |

Here is a vectorized solution.

It creates diagonal views into the output array which allow us to do accumulation in diagonal direction.

Step-by-step explanation:

evaluate s[i] == s[j] in the diagonal view.

only keep those which are connected to the main or first sub- diagonal by a series of Trues in top right to bottom left direction

replace all Trues with 2s except the main diagonal which gets 1s instead; take the cumulative sum in bottom left to top right direction

finally, take the cumulative maximum in bottom up and left right direction

import numpy as np



def loopy(s):

    n = len(s)

    opt = np.zeros(shape=(n, n), dtype=int)

    for j in range(0, n):

        for i in range(j, -1, -1):

            if j - i == 0:

                opt[i, j] = 1

            elif j - i == 1:

                opt[i, j] = 2 if s[i] == s[j] else 1

            elif s[i] == s[j] and opt[i + 1, j - 1] == (j - 1) - (i + 1) + 1:

                opt[i, j] = 2 + opt[i + 1, j - 1]

            else:

                opt[i, j] = max(opt[i + 1, j], opt[i, j - 1], opt[i + 1, j - 1])

    return opt



def vect(s):

    n = len(s)

    h = (n+1) // 2

    s = np.array([s, s]).view('U1').ravel()

    opt = np.zeros((n+2*h-1, n+2*h-1), int)

    y, x = opt.strides

    hh = np.lib.stride_tricks.as_strided(opt[h-1:, h-1:], (2, h, n), (x, x-y, x+y))

    p, o, c = np.ogrid[:2, :h, :n]

    hh[...] = 2 * np.logical_and.accumulate(s[c+o+p] == s[c-o], axis=1)

    np.einsum('ii->i', opt)[...] = 1

    hh[...] = hh.cumsum(axis=1)

    opt = np.maximum.accumulate(opt[-h-1:None if h == 1 else h-2:-1, h-1:-h], axis=0)[::-1]

    return np.maximum.accumulate(opt, axis=1)



def stresstest(n=100):

    from string import ascii_lowercase

    import random

    from timeit import timeit

    Tv, Tl = 0, 0

    for i in range(n):

        s = ''.join(random.choices(ascii_lowercase[:random.randint(2, 26)], k=random.randint(1, 100)))

        print(s, end=' ')

        assert np.all(vect(s) == loopy(s))

        Tv += timeit(lambda: vect(s), number=10)

        Tl += timeit(lambda: loopy(s), number=10)

    print()

    print(f"total time loopy {Tl}, vect {Tv}")

Demo:

>>> stresstest(20)

caccbbdbcfbfdcacebbecffacabeddcfdededeeafaebeaeedaaedaabebfacbdd fckjhrmupcqmihlohjog dffffgalbdbhkjigladhgdjaaagelddehahbbhejkibdgjhlkbcihiejdgidljfalfhlaglcgcih eacdebdcfcdcccaacfccefbccbced agglljlhfj mvwlkedblhvwbsmvtbjpqhgbaolnceqpgkhfivtbkwgbvujskkoklgforocj jljiqlidcdolcpmbfdqbdpjjjhbklcqmnmkfckkch ohsxiviwanuafkjocpexjmdiwlcmtcbagksodasdriieikvxphksedajwrbpee mcwdxsoghnuvxglhxcxxrezcdkahpijgujqqrqaideyhepfmrgxndhyifg omhppjaenjprnd roubpjfjbiafulerejpdniniuljqpouimsfukudndgtjggtbcjbchhfcdhrgf krutrwnttvqdemuwqwidvntpvptjqmekjctvbbetrvehsgxqfsjhoivdvwonvjd adiccabdbifigeigdfaieecceciaghadiaigibehdaichfibeaggcgdciahfegefigghgebhddciaei llobdegpmebejvotsr rtnsevatjvuowmquaulfmgiwsophuvlablslbwrpnhtekmpphsenarhrptgbjvlseeqstewjgfhopqwgmcbcihljeguv gcjlfihmfjbkdmimjknamfbahiccbhnceiahbnhghnlleimmieglgbfjbnmemdgddndhinncegnmgmfmgahhhjkg nhbnfhp cyjcygpaaeotcpwfhnumcfveq snyefmeuyjhcglyluezrx hcjhejhdaejchedbce 

total time loopy 0.2523909523151815, vect 0.03500175685621798

answered Nov 22 '18 at 2:56

Paul Panzer

29.8k21240

add a comment |

Here is a vectorized solution.

It creates diagonal views into the output array which allow us to do accumulation in diagonal direction.

Step-by-step explanation:

evaluate s[i] == s[j] in the diagonal view.

only keep those which are connected to the main or first sub- diagonal by a series of Trues in top right to bottom left direction

replace all Trues with 2s except the main diagonal which gets 1s instead; take the cumulative sum in bottom left to top right direction

finally, take the cumulative maximum in bottom up and left right direction

import numpy as np



def loopy(s):

    n = len(s)

    opt = np.zeros(shape=(n, n), dtype=int)

    for j in range(0, n):

        for i in range(j, -1, -1):

            if j - i == 0:

                opt[i, j] = 1

            elif j - i == 1:

                opt[i, j] = 2 if s[i] == s[j] else 1

            elif s[i] == s[j] and opt[i + 1, j - 1] == (j - 1) - (i + 1) + 1:

                opt[i, j] = 2 + opt[i + 1, j - 1]

            else:

                opt[i, j] = max(opt[i + 1, j], opt[i, j - 1], opt[i + 1, j - 1])

    return opt



def vect(s):

    n = len(s)

    h = (n+1) // 2

    s = np.array([s, s]).view('U1').ravel()

    opt = np.zeros((n+2*h-1, n+2*h-1), int)

    y, x = opt.strides

    hh = np.lib.stride_tricks.as_strided(opt[h-1:, h-1:], (2, h, n), (x, x-y, x+y))

    p, o, c = np.ogrid[:2, :h, :n]

    hh[...] = 2 * np.logical_and.accumulate(s[c+o+p] == s[c-o], axis=1)

    np.einsum('ii->i', opt)[...] = 1

    hh[...] = hh.cumsum(axis=1)

    opt = np.maximum.accumulate(opt[-h-1:None if h == 1 else h-2:-1, h-1:-h], axis=0)[::-1]

    return np.maximum.accumulate(opt, axis=1)



def stresstest(n=100):

    from string import ascii_lowercase

    import random

    from timeit import timeit

    Tv, Tl = 0, 0

    for i in range(n):

        s = ''.join(random.choices(ascii_lowercase[:random.randint(2, 26)], k=random.randint(1, 100)))

        print(s, end=' ')

        assert np.all(vect(s) == loopy(s))

        Tv += timeit(lambda: vect(s), number=10)

        Tl += timeit(lambda: loopy(s), number=10)

    print()

    print(f"total time loopy {Tl}, vect {Tv}")

Demo:

>>> stresstest(20)

caccbbdbcfbfdcacebbecffacabeddcfdededeeafaebeaeedaaedaabebfacbdd fckjhrmupcqmihlohjog dffffgalbdbhkjigladhgdjaaagelddehahbbhejkibdgjhlkbcihiejdgidljfalfhlaglcgcih eacdebdcfcdcccaacfccefbccbced agglljlhfj mvwlkedblhvwbsmvtbjpqhgbaolnceqpgkhfivtbkwgbvujskkoklgforocj jljiqlidcdolcpmbfdqbdpjjjhbklcqmnmkfckkch ohsxiviwanuafkjocpexjmdiwlcmtcbagksodasdriieikvxphksedajwrbpee mcwdxsoghnuvxglhxcxxrezcdkahpijgujqqrqaideyhepfmrgxndhyifg omhppjaenjprnd roubpjfjbiafulerejpdniniuljqpouimsfukudndgtjggtbcjbchhfcdhrgf krutrwnttvqdemuwqwidvntpvptjqmekjctvbbetrvehsgxqfsjhoivdvwonvjd adiccabdbifigeigdfaieecceciaghadiaigibehdaichfibeaggcgdciahfegefigghgebhddciaei llobdegpmebejvotsr rtnsevatjvuowmquaulfmgiwsophuvlablslbwrpnhtekmpphsenarhrptgbjvlseeqstewjgfhopqwgmcbcihljeguv gcjlfihmfjbkdmimjknamfbahiccbhnceiahbnhghnlleimmieglgbfjbnmemdgddndhinncegnmgmfmgahhhjkg nhbnfhp cyjcygpaaeotcpwfhnumcfveq snyefmeuyjhcglyluezrx hcjhejhdaejchedbce 

total time loopy 0.2523909523151815, vect 0.03500175685621798

answered Nov 22 '18 at 2:56

Paul Panzer

29.8k21240

add a comment |

Here is a vectorized solution.

It creates diagonal views into the output array which allow us to do accumulation in diagonal direction.

Step-by-step explanation:

evaluate s[i] == s[j] in the diagonal view.

only keep those which are connected to the main or first sub- diagonal by a series of Trues in top right to bottom left direction

replace all Trues with 2s except the main diagonal which gets 1s instead; take the cumulative sum in bottom left to top right direction

finally, take the cumulative maximum in bottom up and left right direction

import numpy as np



def loopy(s):

    n = len(s)

    opt = np.zeros(shape=(n, n), dtype=int)

    for j in range(0, n):

        for i in range(j, -1, -1):

            if j - i == 0:

                opt[i, j] = 1

            elif j - i == 1:

                opt[i, j] = 2 if s[i] == s[j] else 1

            elif s[i] == s[j] and opt[i + 1, j - 1] == (j - 1) - (i + 1) + 1:

                opt[i, j] = 2 + opt[i + 1, j - 1]

            else:

                opt[i, j] = max(opt[i + 1, j], opt[i, j - 1], opt[i + 1, j - 1])

    return opt



def vect(s):

    n = len(s)

    h = (n+1) // 2

    s = np.array([s, s]).view('U1').ravel()

    opt = np.zeros((n+2*h-1, n+2*h-1), int)

    y, x = opt.strides

    hh = np.lib.stride_tricks.as_strided(opt[h-1:, h-1:], (2, h, n), (x, x-y, x+y))

    p, o, c = np.ogrid[:2, :h, :n]

    hh[...] = 2 * np.logical_and.accumulate(s[c+o+p] == s[c-o], axis=1)

    np.einsum('ii->i', opt)[...] = 1

    hh[...] = hh.cumsum(axis=1)

    opt = np.maximum.accumulate(opt[-h-1:None if h == 1 else h-2:-1, h-1:-h], axis=0)[::-1]

    return np.maximum.accumulate(opt, axis=1)



def stresstest(n=100):

    from string import ascii_lowercase

    import random

    from timeit import timeit

    Tv, Tl = 0, 0

    for i in range(n):

        s = ''.join(random.choices(ascii_lowercase[:random.randint(2, 26)], k=random.randint(1, 100)))

        print(s, end=' ')

        assert np.all(vect(s) == loopy(s))

        Tv += timeit(lambda: vect(s), number=10)

        Tl += timeit(lambda: loopy(s), number=10)

    print()

    print(f"total time loopy {Tl}, vect {Tv}")

Demo:

>>> stresstest(20)

caccbbdbcfbfdcacebbecffacabeddcfdededeeafaebeaeedaaedaabebfacbdd fckjhrmupcqmihlohjog dffffgalbdbhkjigladhgdjaaagelddehahbbhejkibdgjhlkbcihiejdgidljfalfhlaglcgcih eacdebdcfcdcccaacfccefbccbced agglljlhfj mvwlkedblhvwbsmvtbjpqhgbaolnceqpgkhfivtbkwgbvujskkoklgforocj jljiqlidcdolcpmbfdqbdpjjjhbklcqmnmkfckkch ohsxiviwanuafkjocpexjmdiwlcmtcbagksodasdriieikvxphksedajwrbpee mcwdxsoghnuvxglhxcxxrezcdkahpijgujqqrqaideyhepfmrgxndhyifg omhppjaenjprnd roubpjfjbiafulerejpdniniuljqpouimsfukudndgtjggtbcjbchhfcdhrgf krutrwnttvqdemuwqwidvntpvptjqmekjctvbbetrvehsgxqfsjhoivdvwonvjd adiccabdbifigeigdfaieecceciaghadiaigibehdaichfibeaggcgdciahfegefigghgebhddciaei llobdegpmebejvotsr rtnsevatjvuowmquaulfmgiwsophuvlablslbwrpnhtekmpphsenarhrptgbjvlseeqstewjgfhopqwgmcbcihljeguv gcjlfihmfjbkdmimjknamfbahiccbhnceiahbnhghnlleimmieglgbfjbnmemdgddndhinncegnmgmfmgahhhjkg nhbnfhp cyjcygpaaeotcpwfhnumcfveq snyefmeuyjhcglyluezrx hcjhejhdaejchedbce 

total time loopy 0.2523909523151815, vect 0.03500175685621798

answered Nov 22 '18 at 2:56

Paul Panzer

29.8k21240

Here is a vectorized solution.

It creates diagonal views into the output array which allow us to do accumulation in diagonal direction.

Step-by-step explanation:

evaluate s[i] == s[j] in the diagonal view.

only keep those which are connected to the main or first sub- diagonal by a series of Trues in top right to bottom left direction

replace all Trues with 2s except the main diagonal which gets 1s instead; take the cumulative sum in bottom left to top right direction

finally, take the cumulative maximum in bottom up and left right direction

import numpy as np



def loopy(s):

    n = len(s)

    opt = np.zeros(shape=(n, n), dtype=int)

    for j in range(0, n):

        for i in range(j, -1, -1):

            if j - i == 0:

                opt[i, j] = 1

            elif j - i == 1:

                opt[i, j] = 2 if s[i] == s[j] else 1

            elif s[i] == s[j] and opt[i + 1, j - 1] == (j - 1) - (i + 1) + 1:

                opt[i, j] = 2 + opt[i + 1, j - 1]

            else:

                opt[i, j] = max(opt[i + 1, j], opt[i, j - 1], opt[i + 1, j - 1])

    return opt



def vect(s):

    n = len(s)

    h = (n+1) // 2

    s = np.array([s, s]).view('U1').ravel()

    opt = np.zeros((n+2*h-1, n+2*h-1), int)

    y, x = opt.strides

    hh = np.lib.stride_tricks.as_strided(opt[h-1:, h-1:], (2, h, n), (x, x-y, x+y))

    p, o, c = np.ogrid[:2, :h, :n]

    hh[...] = 2 * np.logical_and.accumulate(s[c+o+p] == s[c-o], axis=1)

    np.einsum('ii->i', opt)[...] = 1

    hh[...] = hh.cumsum(axis=1)

    opt = np.maximum.accumulate(opt[-h-1:None if h == 1 else h-2:-1, h-1:-h], axis=0)[::-1]

    return np.maximum.accumulate(opt, axis=1)



def stresstest(n=100):

    from string import ascii_lowercase

    import random

    from timeit import timeit

    Tv, Tl = 0, 0

    for i in range(n):

        s = ''.join(random.choices(ascii_lowercase[:random.randint(2, 26)], k=random.randint(1, 100)))

        print(s, end=' ')

        assert np.all(vect(s) == loopy(s))

        Tv += timeit(lambda: vect(s), number=10)

        Tl += timeit(lambda: loopy(s), number=10)

    print()

    print(f"total time loopy {Tl}, vect {Tv}")

Demo:

>>> stresstest(20)

caccbbdbcfbfdcacebbecffacabeddcfdededeeafaebeaeedaaedaabebfacbdd fckjhrmupcqmihlohjog dffffgalbdbhkjigladhgdjaaagelddehahbbhejkibdgjhlkbcihiejdgidljfalfhlaglcgcih eacdebdcfcdcccaacfccefbccbced agglljlhfj mvwlkedblhvwbsmvtbjpqhgbaolnceqpgkhfivtbkwgbvujskkoklgforocj jljiqlidcdolcpmbfdqbdpjjjhbklcqmnmkfckkch ohsxiviwanuafkjocpexjmdiwlcmtcbagksodasdriieikvxphksedajwrbpee mcwdxsoghnuvxglhxcxxrezcdkahpijgujqqrqaideyhepfmrgxndhyifg omhppjaenjprnd roubpjfjbiafulerejpdniniuljqpouimsfukudndgtjggtbcjbchhfcdhrgf krutrwnttvqdemuwqwidvntpvptjqmekjctvbbetrvehsgxqfsjhoivdvwonvjd adiccabdbifigeigdfaieecceciaghadiaigibehdaichfibeaggcgdciahfegefigghgebhddciaei llobdegpmebejvotsr rtnsevatjvuowmquaulfmgiwsophuvlablslbwrpnhtekmpphsenarhrptgbjvlseeqstewjgfhopqwgmcbcihljeguv gcjlfihmfjbkdmimjknamfbahiccbhnceiahbnhghnlleimmieglgbfjbnmemdgddndhinncegnmgmfmgahhhjkg nhbnfhp cyjcygpaaeotcpwfhnumcfveq snyefmeuyjhcglyluezrx hcjhejhdaejchedbce 

total time loopy 0.2523909523151815, vect 0.03500175685621798

answered Nov 22 '18 at 2:56

Paul Panzer

29.8k21240

answered Nov 22 '18 at 2:56

Paul Panzer

29.8k21240

answered Nov 22 '18 at 2:56

Paul Panzer

29.8k21240

answered Nov 22 '18 at 2:56

Paul Panzer

29.8k21240

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

Search This Blog

Ufyukyu