Speed Up Nested For Loops with NumPy












1















I'm trying to solve a dynamic programming problem, and I came up with a simple loop-based algorithm which fills in a 2D array based on a series of if statements like this:



s = # some string of size n
opt = numpy.zeros(shape=(n, n))

for j in range(0, n):
for i in range(j, -1, -1):
if j - i == 0:
opt[i, j] = 1
elif j - i == 1:
opt[i, j] = 2 if s[i] == s[j] else 1
elif s[i] == s[j] and opt[i + 1, j - 1] == (j - 1) - (i + 1) + 1:
opt[i, j] = 2 + opt[i + 1, j - 1]
else:
opt[i, j] = max(opt[i + 1, j], opt[i, j - 1], opt[i + 1, j - 1])


Unfortunately, this code is extremely slow for large values of N. I found that it is much better to use built in functions such as numpy.where and numpy.fill to fill in the values of the array as opposed to for loops, but I'm struggling to find any examples which explain how these functions (or other optimized numpy methods) can be made to work with a series of if statements, as my algorithm does. What would be an appropriate way to rewrite the above code with built-in numpy libraries to make it better optimized for Python?










share|improve this question



























    1















    I'm trying to solve a dynamic programming problem, and I came up with a simple loop-based algorithm which fills in a 2D array based on a series of if statements like this:



    s = # some string of size n
    opt = numpy.zeros(shape=(n, n))

    for j in range(0, n):
    for i in range(j, -1, -1):
    if j - i == 0:
    opt[i, j] = 1
    elif j - i == 1:
    opt[i, j] = 2 if s[i] == s[j] else 1
    elif s[i] == s[j] and opt[i + 1, j - 1] == (j - 1) - (i + 1) + 1:
    opt[i, j] = 2 + opt[i + 1, j - 1]
    else:
    opt[i, j] = max(opt[i + 1, j], opt[i, j - 1], opt[i + 1, j - 1])


    Unfortunately, this code is extremely slow for large values of N. I found that it is much better to use built in functions such as numpy.where and numpy.fill to fill in the values of the array as opposed to for loops, but I'm struggling to find any examples which explain how these functions (or other optimized numpy methods) can be made to work with a series of if statements, as my algorithm does. What would be an appropriate way to rewrite the above code with built-in numpy libraries to make it better optimized for Python?










    share|improve this question

























      1












      1








      1








      I'm trying to solve a dynamic programming problem, and I came up with a simple loop-based algorithm which fills in a 2D array based on a series of if statements like this:



      s = # some string of size n
      opt = numpy.zeros(shape=(n, n))

      for j in range(0, n):
      for i in range(j, -1, -1):
      if j - i == 0:
      opt[i, j] = 1
      elif j - i == 1:
      opt[i, j] = 2 if s[i] == s[j] else 1
      elif s[i] == s[j] and opt[i + 1, j - 1] == (j - 1) - (i + 1) + 1:
      opt[i, j] = 2 + opt[i + 1, j - 1]
      else:
      opt[i, j] = max(opt[i + 1, j], opt[i, j - 1], opt[i + 1, j - 1])


      Unfortunately, this code is extremely slow for large values of N. I found that it is much better to use built in functions such as numpy.where and numpy.fill to fill in the values of the array as opposed to for loops, but I'm struggling to find any examples which explain how these functions (or other optimized numpy methods) can be made to work with a series of if statements, as my algorithm does. What would be an appropriate way to rewrite the above code with built-in numpy libraries to make it better optimized for Python?










      share|improve this question














      I'm trying to solve a dynamic programming problem, and I came up with a simple loop-based algorithm which fills in a 2D array based on a series of if statements like this:



      s = # some string of size n
      opt = numpy.zeros(shape=(n, n))

      for j in range(0, n):
      for i in range(j, -1, -1):
      if j - i == 0:
      opt[i, j] = 1
      elif j - i == 1:
      opt[i, j] = 2 if s[i] == s[j] else 1
      elif s[i] == s[j] and opt[i + 1, j - 1] == (j - 1) - (i + 1) + 1:
      opt[i, j] = 2 + opt[i + 1, j - 1]
      else:
      opt[i, j] = max(opt[i + 1, j], opt[i, j - 1], opt[i + 1, j - 1])


      Unfortunately, this code is extremely slow for large values of N. I found that it is much better to use built in functions such as numpy.where and numpy.fill to fill in the values of the array as opposed to for loops, but I'm struggling to find any examples which explain how these functions (or other optimized numpy methods) can be made to work with a series of if statements, as my algorithm does. What would be an appropriate way to rewrite the above code with built-in numpy libraries to make it better optimized for Python?







      python numpy optimization






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 22 '18 at 0:38









      user3250889user3250889

      767




      767
























          3 Answers
          3






          active

          oldest

          votes


















          1














          I don't think that np.where and np.fill can solve your problem. np.where is used to return elements of a numpy array that satisfy a certain condition, but in your case, the condition is NOT on VALUES of the numpy array, but on the values from variables i and j.



          For your particular question, I would recommend using Cython to optimize your code specially for larger values of N. Cython is basically an interface between Python and C. The beauty of Cython is that it allows you to keep your python syntax, but optimize it using C structures. It allows you to define types of variables in a C-like manner to speed up your computations. For example, defining i and j as integers using Cython will speed thing up quite considerably because the types of i and j are checked at every loop iteration.



          Also, Cython will allow you to define classic, fast, 2D arrays using C. You can then use pointers for fast element access to this 2D array instead of using numpy arrays. In your case, opt will be that 2D array.






          share|improve this answer































            0














            Your if statements and the left-hand sides of your assignment statements contain references to the array that you're modifying in the loop. This means that there will be no general way to translate your loop into array operations. So you're stuck with some kind of for loop.



            If you instead had the simpler loop:



            for j in range(0, n):
            for i in range(j, -1, -1):
            if j - i == 0:
            opt[i, j] = 1
            elif j - i == 1:
            opt[i, j] = 2
            elif s[i] == s[j]:
            opt[i, j] = 3
            else:
            opt[i, j] = 4


            you could construct boolean arrays (using some broadcasting) that represent your three conditions:



            import numpy as np

            # get arrays i and j that represent the row and column indices
            i,j = np.ogrid[:n, :n]
            # construct an array with the characters from s
            sarr = np.fromiter(s, dtype='U1').reshape(1, -1)

            cond1 = i==j # result will be a bool arr with True wherever row index equals column index
            cond2 = j==i+1 # result will be a bool arr with True wherever col index equals (row index + 1)
            cond3 = sarr==sarr.T # result will be a bool arr with True wherever s[i]==s[j]


            You could then use numpy.select to construct your desired opt:



            opt = np.select([cond1, cond2, cond3], [1, 2, 3], default=4)


            For n=5 and s='abbca', this would yield:



            array([[1, 2, 4, 4, 3],
            [4, 1, 2, 4, 4],
            [4, 3, 1, 2, 4],
            [4, 4, 4, 1, 2],
            [3, 4, 4, 4, 1]])





            share|improve this answer

































              0














              Here is a vectorized solution.



              It creates diagonal views into the output array which allow us to do accumulation in diagonal direction.



              Step-by-step explanation:




              • evaluate s[i] == s[j] in the diagonal view.


              • only keep those which are connected to the main or first sub- diagonal by a series of Trues in top right to bottom left direction


              • replace all Trues with 2s except the main diagonal which gets 1s instead; take the cumulative sum in bottom left to top right direction


              • finally, take the cumulative maximum in bottom up and left right direction



              As it is not totally obvious this does the same as the loopy code I've tested on quite a few examples (using function stresstest below) and it seems correct. And is roughly 7x faster for moderately large strings (1-100 characters).



              import numpy as np

              def loopy(s):
              n = len(s)
              opt = np.zeros(shape=(n, n), dtype=int)
              for j in range(0, n):
              for i in range(j, -1, -1):
              if j - i == 0:
              opt[i, j] = 1
              elif j - i == 1:
              opt[i, j] = 2 if s[i] == s[j] else 1
              elif s[i] == s[j] and opt[i + 1, j - 1] == (j - 1) - (i + 1) + 1:
              opt[i, j] = 2 + opt[i + 1, j - 1]
              else:
              opt[i, j] = max(opt[i + 1, j], opt[i, j - 1], opt[i + 1, j - 1])
              return opt

              def vect(s):
              n = len(s)
              h = (n+1) // 2
              s = np.array([s, s]).view('U1').ravel()
              opt = np.zeros((n+2*h-1, n+2*h-1), int)
              y, x = opt.strides
              hh = np.lib.stride_tricks.as_strided(opt[h-1:, h-1:], (2, h, n), (x, x-y, x+y))
              p, o, c = np.ogrid[:2, :h, :n]
              hh[...] = 2 * np.logical_and.accumulate(s[c+o+p] == s[c-o], axis=1)
              np.einsum('ii->i', opt)[...] = 1
              hh[...] = hh.cumsum(axis=1)
              opt = np.maximum.accumulate(opt[-h-1:None if h == 1 else h-2:-1, h-1:-h], axis=0)[::-1]
              return np.maximum.accumulate(opt, axis=1)

              def stresstest(n=100):
              from string import ascii_lowercase
              import random
              from timeit import timeit
              Tv, Tl = 0, 0
              for i in range(n):
              s = ''.join(random.choices(ascii_lowercase[:random.randint(2, 26)], k=random.randint(1, 100)))
              print(s, end=' ')
              assert np.all(vect(s) == loopy(s))
              Tv += timeit(lambda: vect(s), number=10)
              Tl += timeit(lambda: loopy(s), number=10)
              print()
              print(f"total time loopy {Tl}, vect {Tv}")


              Demo:



              >>> stresstest(20)
              caccbbdbcfbfdcacebbecffacabeddcfdededeeafaebeaeedaaedaabebfacbdd fckjhrmupcqmihlohjog dffffgalbdbhkjigladhgdjaaagelddehahbbhejkibdgjhlkbcihiejdgidljfalfhlaglcgcih eacdebdcfcdcccaacfccefbccbced agglljlhfj mvwlkedblhvwbsmvtbjpqhgbaolnceqpgkhfivtbkwgbvujskkoklgforocj jljiqlidcdolcpmbfdqbdpjjjhbklcqmnmkfckkch ohsxiviwanuafkjocpexjmdiwlcmtcbagksodasdriieikvxphksedajwrbpee mcwdxsoghnuvxglhxcxxrezcdkahpijgujqqrqaideyhepfmrgxndhyifg omhppjaenjprnd roubpjfjbiafulerejpdniniuljqpouimsfukudndgtjggtbcjbchhfcdhrgf krutrwnttvqdemuwqwidvntpvptjqmekjctvbbetrvehsgxqfsjhoivdvwonvjd adiccabdbifigeigdfaieecceciaghadiaigibehdaichfibeaggcgdciahfegefigghgebhddciaei llobdegpmebejvotsr rtnsevatjvuowmquaulfmgiwsophuvlablslbwrpnhtekmpphsenarhrptgbjvlseeqstewjgfhopqwgmcbcihljeguv gcjlfihmfjbkdmimjknamfbahiccbhnceiahbnhghnlleimmieglgbfjbnmemdgddndhinncegnmgmfmgahhhjkg nhbnfhp cyjcygpaaeotcpwfhnumcfveq snyefmeuyjhcglyluezrx hcjhejhdaejchedbce
              total time loopy 0.2523909523151815, vect 0.03500175685621798





              share|improve this answer























                Your Answer






                StackExchange.ifUsing("editor", function () {
                StackExchange.using("externalEditor", function () {
                StackExchange.using("snippets", function () {
                StackExchange.snippets.init();
                });
                });
                }, "code-snippets");

                StackExchange.ready(function() {
                var channelOptions = {
                tags: "".split(" "),
                id: "1"
                };
                initTagRenderer("".split(" "), "".split(" "), channelOptions);

                StackExchange.using("externalEditor", function() {
                // Have to fire editor after snippets, if snippets enabled
                if (StackExchange.settings.snippets.snippetsEnabled) {
                StackExchange.using("snippets", function() {
                createEditor();
                });
                }
                else {
                createEditor();
                }
                });

                function createEditor() {
                StackExchange.prepareEditor({
                heartbeatType: 'answer',
                autoActivateHeartbeat: false,
                convertImagesToLinks: true,
                noModals: true,
                showLowRepImageUploadWarning: true,
                reputationToPostImages: 10,
                bindNavPrevention: true,
                postfix: "",
                imageUploader: {
                brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
                contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
                allowUrls: true
                },
                onDemand: true,
                discardSelector: ".discard-answer"
                ,immediatelyShowMarkdownHelp:true
                });


                }
                });














                draft saved

                draft discarded


















                StackExchange.ready(
                function () {
                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53422382%2fspeed-up-nested-for-loops-with-numpy%23new-answer', 'question_page');
                }
                );

                Post as a guest















                Required, but never shown

























                3 Answers
                3






                active

                oldest

                votes








                3 Answers
                3






                active

                oldest

                votes









                active

                oldest

                votes






                active

                oldest

                votes









                1














                I don't think that np.where and np.fill can solve your problem. np.where is used to return elements of a numpy array that satisfy a certain condition, but in your case, the condition is NOT on VALUES of the numpy array, but on the values from variables i and j.



                For your particular question, I would recommend using Cython to optimize your code specially for larger values of N. Cython is basically an interface between Python and C. The beauty of Cython is that it allows you to keep your python syntax, but optimize it using C structures. It allows you to define types of variables in a C-like manner to speed up your computations. For example, defining i and j as integers using Cython will speed thing up quite considerably because the types of i and j are checked at every loop iteration.



                Also, Cython will allow you to define classic, fast, 2D arrays using C. You can then use pointers for fast element access to this 2D array instead of using numpy arrays. In your case, opt will be that 2D array.






                share|improve this answer




























                  1














                  I don't think that np.where and np.fill can solve your problem. np.where is used to return elements of a numpy array that satisfy a certain condition, but in your case, the condition is NOT on VALUES of the numpy array, but on the values from variables i and j.



                  For your particular question, I would recommend using Cython to optimize your code specially for larger values of N. Cython is basically an interface between Python and C. The beauty of Cython is that it allows you to keep your python syntax, but optimize it using C structures. It allows you to define types of variables in a C-like manner to speed up your computations. For example, defining i and j as integers using Cython will speed thing up quite considerably because the types of i and j are checked at every loop iteration.



                  Also, Cython will allow you to define classic, fast, 2D arrays using C. You can then use pointers for fast element access to this 2D array instead of using numpy arrays. In your case, opt will be that 2D array.






                  share|improve this answer


























                    1












                    1








                    1







                    I don't think that np.where and np.fill can solve your problem. np.where is used to return elements of a numpy array that satisfy a certain condition, but in your case, the condition is NOT on VALUES of the numpy array, but on the values from variables i and j.



                    For your particular question, I would recommend using Cython to optimize your code specially for larger values of N. Cython is basically an interface between Python and C. The beauty of Cython is that it allows you to keep your python syntax, but optimize it using C structures. It allows you to define types of variables in a C-like manner to speed up your computations. For example, defining i and j as integers using Cython will speed thing up quite considerably because the types of i and j are checked at every loop iteration.



                    Also, Cython will allow you to define classic, fast, 2D arrays using C. You can then use pointers for fast element access to this 2D array instead of using numpy arrays. In your case, opt will be that 2D array.






                    share|improve this answer













                    I don't think that np.where and np.fill can solve your problem. np.where is used to return elements of a numpy array that satisfy a certain condition, but in your case, the condition is NOT on VALUES of the numpy array, but on the values from variables i and j.



                    For your particular question, I would recommend using Cython to optimize your code specially for larger values of N. Cython is basically an interface between Python and C. The beauty of Cython is that it allows you to keep your python syntax, but optimize it using C structures. It allows you to define types of variables in a C-like manner to speed up your computations. For example, defining i and j as integers using Cython will speed thing up quite considerably because the types of i and j are checked at every loop iteration.



                    Also, Cython will allow you to define classic, fast, 2D arrays using C. You can then use pointers for fast element access to this 2D array instead of using numpy arrays. In your case, opt will be that 2D array.







                    share|improve this answer












                    share|improve this answer



                    share|improve this answer










                    answered Nov 22 '18 at 1:32









                    user10664643user10664643

                    286




                    286

























                        0














                        Your if statements and the left-hand sides of your assignment statements contain references to the array that you're modifying in the loop. This means that there will be no general way to translate your loop into array operations. So you're stuck with some kind of for loop.



                        If you instead had the simpler loop:



                        for j in range(0, n):
                        for i in range(j, -1, -1):
                        if j - i == 0:
                        opt[i, j] = 1
                        elif j - i == 1:
                        opt[i, j] = 2
                        elif s[i] == s[j]:
                        opt[i, j] = 3
                        else:
                        opt[i, j] = 4


                        you could construct boolean arrays (using some broadcasting) that represent your three conditions:



                        import numpy as np

                        # get arrays i and j that represent the row and column indices
                        i,j = np.ogrid[:n, :n]
                        # construct an array with the characters from s
                        sarr = np.fromiter(s, dtype='U1').reshape(1, -1)

                        cond1 = i==j # result will be a bool arr with True wherever row index equals column index
                        cond2 = j==i+1 # result will be a bool arr with True wherever col index equals (row index + 1)
                        cond3 = sarr==sarr.T # result will be a bool arr with True wherever s[i]==s[j]


                        You could then use numpy.select to construct your desired opt:



                        opt = np.select([cond1, cond2, cond3], [1, 2, 3], default=4)


                        For n=5 and s='abbca', this would yield:



                        array([[1, 2, 4, 4, 3],
                        [4, 1, 2, 4, 4],
                        [4, 3, 1, 2, 4],
                        [4, 4, 4, 1, 2],
                        [3, 4, 4, 4, 1]])





                        share|improve this answer






























                          0














                          Your if statements and the left-hand sides of your assignment statements contain references to the array that you're modifying in the loop. This means that there will be no general way to translate your loop into array operations. So you're stuck with some kind of for loop.



                          If you instead had the simpler loop:



                          for j in range(0, n):
                          for i in range(j, -1, -1):
                          if j - i == 0:
                          opt[i, j] = 1
                          elif j - i == 1:
                          opt[i, j] = 2
                          elif s[i] == s[j]:
                          opt[i, j] = 3
                          else:
                          opt[i, j] = 4


                          you could construct boolean arrays (using some broadcasting) that represent your three conditions:



                          import numpy as np

                          # get arrays i and j that represent the row and column indices
                          i,j = np.ogrid[:n, :n]
                          # construct an array with the characters from s
                          sarr = np.fromiter(s, dtype='U1').reshape(1, -1)

                          cond1 = i==j # result will be a bool arr with True wherever row index equals column index
                          cond2 = j==i+1 # result will be a bool arr with True wherever col index equals (row index + 1)
                          cond3 = sarr==sarr.T # result will be a bool arr with True wherever s[i]==s[j]


                          You could then use numpy.select to construct your desired opt:



                          opt = np.select([cond1, cond2, cond3], [1, 2, 3], default=4)


                          For n=5 and s='abbca', this would yield:



                          array([[1, 2, 4, 4, 3],
                          [4, 1, 2, 4, 4],
                          [4, 3, 1, 2, 4],
                          [4, 4, 4, 1, 2],
                          [3, 4, 4, 4, 1]])





                          share|improve this answer




























                            0












                            0








                            0







                            Your if statements and the left-hand sides of your assignment statements contain references to the array that you're modifying in the loop. This means that there will be no general way to translate your loop into array operations. So you're stuck with some kind of for loop.



                            If you instead had the simpler loop:



                            for j in range(0, n):
                            for i in range(j, -1, -1):
                            if j - i == 0:
                            opt[i, j] = 1
                            elif j - i == 1:
                            opt[i, j] = 2
                            elif s[i] == s[j]:
                            opt[i, j] = 3
                            else:
                            opt[i, j] = 4


                            you could construct boolean arrays (using some broadcasting) that represent your three conditions:



                            import numpy as np

                            # get arrays i and j that represent the row and column indices
                            i,j = np.ogrid[:n, :n]
                            # construct an array with the characters from s
                            sarr = np.fromiter(s, dtype='U1').reshape(1, -1)

                            cond1 = i==j # result will be a bool arr with True wherever row index equals column index
                            cond2 = j==i+1 # result will be a bool arr with True wherever col index equals (row index + 1)
                            cond3 = sarr==sarr.T # result will be a bool arr with True wherever s[i]==s[j]


                            You could then use numpy.select to construct your desired opt:



                            opt = np.select([cond1, cond2, cond3], [1, 2, 3], default=4)


                            For n=5 and s='abbca', this would yield:



                            array([[1, 2, 4, 4, 3],
                            [4, 1, 2, 4, 4],
                            [4, 3, 1, 2, 4],
                            [4, 4, 4, 1, 2],
                            [3, 4, 4, 4, 1]])





                            share|improve this answer















                            Your if statements and the left-hand sides of your assignment statements contain references to the array that you're modifying in the loop. This means that there will be no general way to translate your loop into array operations. So you're stuck with some kind of for loop.



                            If you instead had the simpler loop:



                            for j in range(0, n):
                            for i in range(j, -1, -1):
                            if j - i == 0:
                            opt[i, j] = 1
                            elif j - i == 1:
                            opt[i, j] = 2
                            elif s[i] == s[j]:
                            opt[i, j] = 3
                            else:
                            opt[i, j] = 4


                            you could construct boolean arrays (using some broadcasting) that represent your three conditions:



                            import numpy as np

                            # get arrays i and j that represent the row and column indices
                            i,j = np.ogrid[:n, :n]
                            # construct an array with the characters from s
                            sarr = np.fromiter(s, dtype='U1').reshape(1, -1)

                            cond1 = i==j # result will be a bool arr with True wherever row index equals column index
                            cond2 = j==i+1 # result will be a bool arr with True wherever col index equals (row index + 1)
                            cond3 = sarr==sarr.T # result will be a bool arr with True wherever s[i]==s[j]


                            You could then use numpy.select to construct your desired opt:



                            opt = np.select([cond1, cond2, cond3], [1, 2, 3], default=4)


                            For n=5 and s='abbca', this would yield:



                            array([[1, 2, 4, 4, 3],
                            [4, 1, 2, 4, 4],
                            [4, 3, 1, 2, 4],
                            [4, 4, 4, 1, 2],
                            [3, 4, 4, 4, 1]])






                            share|improve this answer














                            share|improve this answer



                            share|improve this answer








                            edited Nov 22 '18 at 2:52

























                            answered Nov 22 '18 at 1:31









                            teltel

                            7,39621431




                            7,39621431























                                0














                                Here is a vectorized solution.



                                It creates diagonal views into the output array which allow us to do accumulation in diagonal direction.



                                Step-by-step explanation:




                                • evaluate s[i] == s[j] in the diagonal view.


                                • only keep those which are connected to the main or first sub- diagonal by a series of Trues in top right to bottom left direction


                                • replace all Trues with 2s except the main diagonal which gets 1s instead; take the cumulative sum in bottom left to top right direction


                                • finally, take the cumulative maximum in bottom up and left right direction



                                As it is not totally obvious this does the same as the loopy code I've tested on quite a few examples (using function stresstest below) and it seems correct. And is roughly 7x faster for moderately large strings (1-100 characters).



                                import numpy as np

                                def loopy(s):
                                n = len(s)
                                opt = np.zeros(shape=(n, n), dtype=int)
                                for j in range(0, n):
                                for i in range(j, -1, -1):
                                if j - i == 0:
                                opt[i, j] = 1
                                elif j - i == 1:
                                opt[i, j] = 2 if s[i] == s[j] else 1
                                elif s[i] == s[j] and opt[i + 1, j - 1] == (j - 1) - (i + 1) + 1:
                                opt[i, j] = 2 + opt[i + 1, j - 1]
                                else:
                                opt[i, j] = max(opt[i + 1, j], opt[i, j - 1], opt[i + 1, j - 1])
                                return opt

                                def vect(s):
                                n = len(s)
                                h = (n+1) // 2
                                s = np.array([s, s]).view('U1').ravel()
                                opt = np.zeros((n+2*h-1, n+2*h-1), int)
                                y, x = opt.strides
                                hh = np.lib.stride_tricks.as_strided(opt[h-1:, h-1:], (2, h, n), (x, x-y, x+y))
                                p, o, c = np.ogrid[:2, :h, :n]
                                hh[...] = 2 * np.logical_and.accumulate(s[c+o+p] == s[c-o], axis=1)
                                np.einsum('ii->i', opt)[...] = 1
                                hh[...] = hh.cumsum(axis=1)
                                opt = np.maximum.accumulate(opt[-h-1:None if h == 1 else h-2:-1, h-1:-h], axis=0)[::-1]
                                return np.maximum.accumulate(opt, axis=1)

                                def stresstest(n=100):
                                from string import ascii_lowercase
                                import random
                                from timeit import timeit
                                Tv, Tl = 0, 0
                                for i in range(n):
                                s = ''.join(random.choices(ascii_lowercase[:random.randint(2, 26)], k=random.randint(1, 100)))
                                print(s, end=' ')
                                assert np.all(vect(s) == loopy(s))
                                Tv += timeit(lambda: vect(s), number=10)
                                Tl += timeit(lambda: loopy(s), number=10)
                                print()
                                print(f"total time loopy {Tl}, vect {Tv}")


                                Demo:



                                >>> stresstest(20)
                                caccbbdbcfbfdcacebbecffacabeddcfdededeeafaebeaeedaaedaabebfacbdd fckjhrmupcqmihlohjog dffffgalbdbhkjigladhgdjaaagelddehahbbhejkibdgjhlkbcihiejdgidljfalfhlaglcgcih eacdebdcfcdcccaacfccefbccbced agglljlhfj mvwlkedblhvwbsmvtbjpqhgbaolnceqpgkhfivtbkwgbvujskkoklgforocj jljiqlidcdolcpmbfdqbdpjjjhbklcqmnmkfckkch ohsxiviwanuafkjocpexjmdiwlcmtcbagksodasdriieikvxphksedajwrbpee mcwdxsoghnuvxglhxcxxrezcdkahpijgujqqrqaideyhepfmrgxndhyifg omhppjaenjprnd roubpjfjbiafulerejpdniniuljqpouimsfukudndgtjggtbcjbchhfcdhrgf krutrwnttvqdemuwqwidvntpvptjqmekjctvbbetrvehsgxqfsjhoivdvwonvjd adiccabdbifigeigdfaieecceciaghadiaigibehdaichfibeaggcgdciahfegefigghgebhddciaei llobdegpmebejvotsr rtnsevatjvuowmquaulfmgiwsophuvlablslbwrpnhtekmpphsenarhrptgbjvlseeqstewjgfhopqwgmcbcihljeguv gcjlfihmfjbkdmimjknamfbahiccbhnceiahbnhghnlleimmieglgbfjbnmemdgddndhinncegnmgmfmgahhhjkg nhbnfhp cyjcygpaaeotcpwfhnumcfveq snyefmeuyjhcglyluezrx hcjhejhdaejchedbce
                                total time loopy 0.2523909523151815, vect 0.03500175685621798





                                share|improve this answer




























                                  0














                                  Here is a vectorized solution.



                                  It creates diagonal views into the output array which allow us to do accumulation in diagonal direction.



                                  Step-by-step explanation:




                                  • evaluate s[i] == s[j] in the diagonal view.


                                  • only keep those which are connected to the main or first sub- diagonal by a series of Trues in top right to bottom left direction


                                  • replace all Trues with 2s except the main diagonal which gets 1s instead; take the cumulative sum in bottom left to top right direction


                                  • finally, take the cumulative maximum in bottom up and left right direction



                                  As it is not totally obvious this does the same as the loopy code I've tested on quite a few examples (using function stresstest below) and it seems correct. And is roughly 7x faster for moderately large strings (1-100 characters).



                                  import numpy as np

                                  def loopy(s):
                                  n = len(s)
                                  opt = np.zeros(shape=(n, n), dtype=int)
                                  for j in range(0, n):
                                  for i in range(j, -1, -1):
                                  if j - i == 0:
                                  opt[i, j] = 1
                                  elif j - i == 1:
                                  opt[i, j] = 2 if s[i] == s[j] else 1
                                  elif s[i] == s[j] and opt[i + 1, j - 1] == (j - 1) - (i + 1) + 1:
                                  opt[i, j] = 2 + opt[i + 1, j - 1]
                                  else:
                                  opt[i, j] = max(opt[i + 1, j], opt[i, j - 1], opt[i + 1, j - 1])
                                  return opt

                                  def vect(s):
                                  n = len(s)
                                  h = (n+1) // 2
                                  s = np.array([s, s]).view('U1').ravel()
                                  opt = np.zeros((n+2*h-1, n+2*h-1), int)
                                  y, x = opt.strides
                                  hh = np.lib.stride_tricks.as_strided(opt[h-1:, h-1:], (2, h, n), (x, x-y, x+y))
                                  p, o, c = np.ogrid[:2, :h, :n]
                                  hh[...] = 2 * np.logical_and.accumulate(s[c+o+p] == s[c-o], axis=1)
                                  np.einsum('ii->i', opt)[...] = 1
                                  hh[...] = hh.cumsum(axis=1)
                                  opt = np.maximum.accumulate(opt[-h-1:None if h == 1 else h-2:-1, h-1:-h], axis=0)[::-1]
                                  return np.maximum.accumulate(opt, axis=1)

                                  def stresstest(n=100):
                                  from string import ascii_lowercase
                                  import random
                                  from timeit import timeit
                                  Tv, Tl = 0, 0
                                  for i in range(n):
                                  s = ''.join(random.choices(ascii_lowercase[:random.randint(2, 26)], k=random.randint(1, 100)))
                                  print(s, end=' ')
                                  assert np.all(vect(s) == loopy(s))
                                  Tv += timeit(lambda: vect(s), number=10)
                                  Tl += timeit(lambda: loopy(s), number=10)
                                  print()
                                  print(f"total time loopy {Tl}, vect {Tv}")


                                  Demo:



                                  >>> stresstest(20)
                                  caccbbdbcfbfdcacebbecffacabeddcfdededeeafaebeaeedaaedaabebfacbdd fckjhrmupcqmihlohjog dffffgalbdbhkjigladhgdjaaagelddehahbbhejkibdgjhlkbcihiejdgidljfalfhlaglcgcih eacdebdcfcdcccaacfccefbccbced agglljlhfj mvwlkedblhvwbsmvtbjpqhgbaolnceqpgkhfivtbkwgbvujskkoklgforocj jljiqlidcdolcpmbfdqbdpjjjhbklcqmnmkfckkch ohsxiviwanuafkjocpexjmdiwlcmtcbagksodasdriieikvxphksedajwrbpee mcwdxsoghnuvxglhxcxxrezcdkahpijgujqqrqaideyhepfmrgxndhyifg omhppjaenjprnd roubpjfjbiafulerejpdniniuljqpouimsfukudndgtjggtbcjbchhfcdhrgf krutrwnttvqdemuwqwidvntpvptjqmekjctvbbetrvehsgxqfsjhoivdvwonvjd adiccabdbifigeigdfaieecceciaghadiaigibehdaichfibeaggcgdciahfegefigghgebhddciaei llobdegpmebejvotsr rtnsevatjvuowmquaulfmgiwsophuvlablslbwrpnhtekmpphsenarhrptgbjvlseeqstewjgfhopqwgmcbcihljeguv gcjlfihmfjbkdmimjknamfbahiccbhnceiahbnhghnlleimmieglgbfjbnmemdgddndhinncegnmgmfmgahhhjkg nhbnfhp cyjcygpaaeotcpwfhnumcfveq snyefmeuyjhcglyluezrx hcjhejhdaejchedbce
                                  total time loopy 0.2523909523151815, vect 0.03500175685621798





                                  share|improve this answer


























                                    0












                                    0








                                    0







                                    Here is a vectorized solution.



                                    It creates diagonal views into the output array which allow us to do accumulation in diagonal direction.



                                    Step-by-step explanation:




                                    • evaluate s[i] == s[j] in the diagonal view.


                                    • only keep those which are connected to the main or first sub- diagonal by a series of Trues in top right to bottom left direction


                                    • replace all Trues with 2s except the main diagonal which gets 1s instead; take the cumulative sum in bottom left to top right direction


                                    • finally, take the cumulative maximum in bottom up and left right direction



                                    As it is not totally obvious this does the same as the loopy code I've tested on quite a few examples (using function stresstest below) and it seems correct. And is roughly 7x faster for moderately large strings (1-100 characters).



                                    import numpy as np

                                    def loopy(s):
                                    n = len(s)
                                    opt = np.zeros(shape=(n, n), dtype=int)
                                    for j in range(0, n):
                                    for i in range(j, -1, -1):
                                    if j - i == 0:
                                    opt[i, j] = 1
                                    elif j - i == 1:
                                    opt[i, j] = 2 if s[i] == s[j] else 1
                                    elif s[i] == s[j] and opt[i + 1, j - 1] == (j - 1) - (i + 1) + 1:
                                    opt[i, j] = 2 + opt[i + 1, j - 1]
                                    else:
                                    opt[i, j] = max(opt[i + 1, j], opt[i, j - 1], opt[i + 1, j - 1])
                                    return opt

                                    def vect(s):
                                    n = len(s)
                                    h = (n+1) // 2
                                    s = np.array([s, s]).view('U1').ravel()
                                    opt = np.zeros((n+2*h-1, n+2*h-1), int)
                                    y, x = opt.strides
                                    hh = np.lib.stride_tricks.as_strided(opt[h-1:, h-1:], (2, h, n), (x, x-y, x+y))
                                    p, o, c = np.ogrid[:2, :h, :n]
                                    hh[...] = 2 * np.logical_and.accumulate(s[c+o+p] == s[c-o], axis=1)
                                    np.einsum('ii->i', opt)[...] = 1
                                    hh[...] = hh.cumsum(axis=1)
                                    opt = np.maximum.accumulate(opt[-h-1:None if h == 1 else h-2:-1, h-1:-h], axis=0)[::-1]
                                    return np.maximum.accumulate(opt, axis=1)

                                    def stresstest(n=100):
                                    from string import ascii_lowercase
                                    import random
                                    from timeit import timeit
                                    Tv, Tl = 0, 0
                                    for i in range(n):
                                    s = ''.join(random.choices(ascii_lowercase[:random.randint(2, 26)], k=random.randint(1, 100)))
                                    print(s, end=' ')
                                    assert np.all(vect(s) == loopy(s))
                                    Tv += timeit(lambda: vect(s), number=10)
                                    Tl += timeit(lambda: loopy(s), number=10)
                                    print()
                                    print(f"total time loopy {Tl}, vect {Tv}")


                                    Demo:



                                    >>> stresstest(20)
                                    caccbbdbcfbfdcacebbecffacabeddcfdededeeafaebeaeedaaedaabebfacbdd fckjhrmupcqmihlohjog dffffgalbdbhkjigladhgdjaaagelddehahbbhejkibdgjhlkbcihiejdgidljfalfhlaglcgcih eacdebdcfcdcccaacfccefbccbced agglljlhfj mvwlkedblhvwbsmvtbjpqhgbaolnceqpgkhfivtbkwgbvujskkoklgforocj jljiqlidcdolcpmbfdqbdpjjjhbklcqmnmkfckkch ohsxiviwanuafkjocpexjmdiwlcmtcbagksodasdriieikvxphksedajwrbpee mcwdxsoghnuvxglhxcxxrezcdkahpijgujqqrqaideyhepfmrgxndhyifg omhppjaenjprnd roubpjfjbiafulerejpdniniuljqpouimsfukudndgtjggtbcjbchhfcdhrgf krutrwnttvqdemuwqwidvntpvptjqmekjctvbbetrvehsgxqfsjhoivdvwonvjd adiccabdbifigeigdfaieecceciaghadiaigibehdaichfibeaggcgdciahfegefigghgebhddciaei llobdegpmebejvotsr rtnsevatjvuowmquaulfmgiwsophuvlablslbwrpnhtekmpphsenarhrptgbjvlseeqstewjgfhopqwgmcbcihljeguv gcjlfihmfjbkdmimjknamfbahiccbhnceiahbnhghnlleimmieglgbfjbnmemdgddndhinncegnmgmfmgahhhjkg nhbnfhp cyjcygpaaeotcpwfhnumcfveq snyefmeuyjhcglyluezrx hcjhejhdaejchedbce
                                    total time loopy 0.2523909523151815, vect 0.03500175685621798





                                    share|improve this answer













                                    Here is a vectorized solution.



                                    It creates diagonal views into the output array which allow us to do accumulation in diagonal direction.



                                    Step-by-step explanation:




                                    • evaluate s[i] == s[j] in the diagonal view.


                                    • only keep those which are connected to the main or first sub- diagonal by a series of Trues in top right to bottom left direction


                                    • replace all Trues with 2s except the main diagonal which gets 1s instead; take the cumulative sum in bottom left to top right direction


                                    • finally, take the cumulative maximum in bottom up and left right direction



                                    As it is not totally obvious this does the same as the loopy code I've tested on quite a few examples (using function stresstest below) and it seems correct. And is roughly 7x faster for moderately large strings (1-100 characters).



                                    import numpy as np

                                    def loopy(s):
                                    n = len(s)
                                    opt = np.zeros(shape=(n, n), dtype=int)
                                    for j in range(0, n):
                                    for i in range(j, -1, -1):
                                    if j - i == 0:
                                    opt[i, j] = 1
                                    elif j - i == 1:
                                    opt[i, j] = 2 if s[i] == s[j] else 1
                                    elif s[i] == s[j] and opt[i + 1, j - 1] == (j - 1) - (i + 1) + 1:
                                    opt[i, j] = 2 + opt[i + 1, j - 1]
                                    else:
                                    opt[i, j] = max(opt[i + 1, j], opt[i, j - 1], opt[i + 1, j - 1])
                                    return opt

                                    def vect(s):
                                    n = len(s)
                                    h = (n+1) // 2
                                    s = np.array([s, s]).view('U1').ravel()
                                    opt = np.zeros((n+2*h-1, n+2*h-1), int)
                                    y, x = opt.strides
                                    hh = np.lib.stride_tricks.as_strided(opt[h-1:, h-1:], (2, h, n), (x, x-y, x+y))
                                    p, o, c = np.ogrid[:2, :h, :n]
                                    hh[...] = 2 * np.logical_and.accumulate(s[c+o+p] == s[c-o], axis=1)
                                    np.einsum('ii->i', opt)[...] = 1
                                    hh[...] = hh.cumsum(axis=1)
                                    opt = np.maximum.accumulate(opt[-h-1:None if h == 1 else h-2:-1, h-1:-h], axis=0)[::-1]
                                    return np.maximum.accumulate(opt, axis=1)

                                    def stresstest(n=100):
                                    from string import ascii_lowercase
                                    import random
                                    from timeit import timeit
                                    Tv, Tl = 0, 0
                                    for i in range(n):
                                    s = ''.join(random.choices(ascii_lowercase[:random.randint(2, 26)], k=random.randint(1, 100)))
                                    print(s, end=' ')
                                    assert np.all(vect(s) == loopy(s))
                                    Tv += timeit(lambda: vect(s), number=10)
                                    Tl += timeit(lambda: loopy(s), number=10)
                                    print()
                                    print(f"total time loopy {Tl}, vect {Tv}")


                                    Demo:



                                    >>> stresstest(20)
                                    caccbbdbcfbfdcacebbecffacabeddcfdededeeafaebeaeedaaedaabebfacbdd fckjhrmupcqmihlohjog dffffgalbdbhkjigladhgdjaaagelddehahbbhejkibdgjhlkbcihiejdgidljfalfhlaglcgcih eacdebdcfcdcccaacfccefbccbced agglljlhfj mvwlkedblhvwbsmvtbjpqhgbaolnceqpgkhfivtbkwgbvujskkoklgforocj jljiqlidcdolcpmbfdqbdpjjjhbklcqmnmkfckkch ohsxiviwanuafkjocpexjmdiwlcmtcbagksodasdriieikvxphksedajwrbpee mcwdxsoghnuvxglhxcxxrezcdkahpijgujqqrqaideyhepfmrgxndhyifg omhppjaenjprnd roubpjfjbiafulerejpdniniuljqpouimsfukudndgtjggtbcjbchhfcdhrgf krutrwnttvqdemuwqwidvntpvptjqmekjctvbbetrvehsgxqfsjhoivdvwonvjd adiccabdbifigeigdfaieecceciaghadiaigibehdaichfibeaggcgdciahfegefigghgebhddciaei llobdegpmebejvotsr rtnsevatjvuowmquaulfmgiwsophuvlablslbwrpnhtekmpphsenarhrptgbjvlseeqstewjgfhopqwgmcbcihljeguv gcjlfihmfjbkdmimjknamfbahiccbhnceiahbnhghnlleimmieglgbfjbnmemdgddndhinncegnmgmfmgahhhjkg nhbnfhp cyjcygpaaeotcpwfhnumcfveq snyefmeuyjhcglyluezrx hcjhejhdaejchedbce
                                    total time loopy 0.2523909523151815, vect 0.03500175685621798






                                    share|improve this answer












                                    share|improve this answer



                                    share|improve this answer










                                    answered Nov 22 '18 at 2:56









                                    Paul PanzerPaul Panzer

                                    29.8k21240




                                    29.8k21240






























                                        draft saved

                                        draft discarded




















































                                        Thanks for contributing an answer to Stack Overflow!


                                        • Please be sure to answer the question. Provide details and share your research!

                                        But avoid



                                        • Asking for help, clarification, or responding to other answers.

                                        • Making statements based on opinion; back them up with references or personal experience.


                                        To learn more, see our tips on writing great answers.




                                        draft saved


                                        draft discarded














                                        StackExchange.ready(
                                        function () {
                                        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53422382%2fspeed-up-nested-for-loops-with-numpy%23new-answer', 'question_page');
                                        }
                                        );

                                        Post as a guest















                                        Required, but never shown





















































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown

































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown







                                        Popular posts from this blog

                                        MongoDB - Not Authorized To Execute Command

                                        in spring boot 2.1 many test slices are not allowed anymore due to multiple @BootstrapWith

                                        Npm cannot find a required file even through it is in the searched directory