Optimizing dictionary loop summing values












0















I have a method that takes a nested dict input_dict



final = 0
for key, value in input_dict[self.state][self.city].iteritems():
age = self._get_age(key)
if (age > 0 and age < MAX_VAL):
final += value * self.lookup[key][age] * self.multiplier

return final


This runs in around .03 seconds, but given that in an example execution it gets called >10k times, it winds up being bottleneck and is responsible for about 50% of the runtime. Assuming I can't reduce the total number of times the method is called, does anyone have suggestions on how to improve this?










share|improve this question




















  • 1





    correct. This is 2.7

    – ARB
    Dec 31 '18 at 20:32











  • What does self._get_age(key) does?

    – Daniel Mesejo
    Dec 31 '18 at 20:33











  • Takes a date period: return (self.current_period - period) / 7

    – ARB
    Dec 31 '18 at 20:35













  • Where is period the key?

    – Daniel Mesejo
    Dec 31 '18 at 20:36











  • input_dict has keys which are dates, so each key is the argument to self._get_age(key)

    – ARB
    Dec 31 '18 at 20:38
















0















I have a method that takes a nested dict input_dict



final = 0
for key, value in input_dict[self.state][self.city].iteritems():
age = self._get_age(key)
if (age > 0 and age < MAX_VAL):
final += value * self.lookup[key][age] * self.multiplier

return final


This runs in around .03 seconds, but given that in an example execution it gets called >10k times, it winds up being bottleneck and is responsible for about 50% of the runtime. Assuming I can't reduce the total number of times the method is called, does anyone have suggestions on how to improve this?










share|improve this question




















  • 1





    correct. This is 2.7

    – ARB
    Dec 31 '18 at 20:32











  • What does self._get_age(key) does?

    – Daniel Mesejo
    Dec 31 '18 at 20:33











  • Takes a date period: return (self.current_period - period) / 7

    – ARB
    Dec 31 '18 at 20:35













  • Where is period the key?

    – Daniel Mesejo
    Dec 31 '18 at 20:36











  • input_dict has keys which are dates, so each key is the argument to self._get_age(key)

    – ARB
    Dec 31 '18 at 20:38














0












0








0








I have a method that takes a nested dict input_dict



final = 0
for key, value in input_dict[self.state][self.city].iteritems():
age = self._get_age(key)
if (age > 0 and age < MAX_VAL):
final += value * self.lookup[key][age] * self.multiplier

return final


This runs in around .03 seconds, but given that in an example execution it gets called >10k times, it winds up being bottleneck and is responsible for about 50% of the runtime. Assuming I can't reduce the total number of times the method is called, does anyone have suggestions on how to improve this?










share|improve this question
















I have a method that takes a nested dict input_dict



final = 0
for key, value in input_dict[self.state][self.city].iteritems():
age = self._get_age(key)
if (age > 0 and age < MAX_VAL):
final += value * self.lookup[key][age] * self.multiplier

return final


This runs in around .03 seconds, but given that in an example execution it gets called >10k times, it winds up being bottleneck and is responsible for about 50% of the runtime. Assuming I can't reduce the total number of times the method is called, does anyone have suggestions on how to improve this?







python algorithm performance optimization python-2.x






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Dec 31 '18 at 20:32









Daniel Mesejo

18.7k21433




18.7k21433










asked Dec 31 '18 at 20:30









ARBARB

1




1








  • 1





    correct. This is 2.7

    – ARB
    Dec 31 '18 at 20:32











  • What does self._get_age(key) does?

    – Daniel Mesejo
    Dec 31 '18 at 20:33











  • Takes a date period: return (self.current_period - period) / 7

    – ARB
    Dec 31 '18 at 20:35













  • Where is period the key?

    – Daniel Mesejo
    Dec 31 '18 at 20:36











  • input_dict has keys which are dates, so each key is the argument to self._get_age(key)

    – ARB
    Dec 31 '18 at 20:38














  • 1





    correct. This is 2.7

    – ARB
    Dec 31 '18 at 20:32











  • What does self._get_age(key) does?

    – Daniel Mesejo
    Dec 31 '18 at 20:33











  • Takes a date period: return (self.current_period - period) / 7

    – ARB
    Dec 31 '18 at 20:35













  • Where is period the key?

    – Daniel Mesejo
    Dec 31 '18 at 20:36











  • input_dict has keys which are dates, so each key is the argument to self._get_age(key)

    – ARB
    Dec 31 '18 at 20:38








1




1





correct. This is 2.7

– ARB
Dec 31 '18 at 20:32





correct. This is 2.7

– ARB
Dec 31 '18 at 20:32













What does self._get_age(key) does?

– Daniel Mesejo
Dec 31 '18 at 20:33





What does self._get_age(key) does?

– Daniel Mesejo
Dec 31 '18 at 20:33













Takes a date period: return (self.current_period - period) / 7

– ARB
Dec 31 '18 at 20:35







Takes a date period: return (self.current_period - period) / 7

– ARB
Dec 31 '18 at 20:35















Where is period the key?

– Daniel Mesejo
Dec 31 '18 at 20:36





Where is period the key?

– Daniel Mesejo
Dec 31 '18 at 20:36













input_dict has keys which are dates, so each key is the argument to self._get_age(key)

– ARB
Dec 31 '18 at 20:38





input_dict has keys which are dates, so each key is the argument to self._get_age(key)

– ARB
Dec 31 '18 at 20:38












2 Answers
2






active

oldest

votes


















0














The built-in sum function is typically faster than writing out a for loop. (See this question.) In your case, you could construct a generator expression of the values to be summed, and then pass that to sum:



items = (
(key,value,self._get_age(key))
for key,value in input_dict[self.state][self.city].iteritems()
)
return sum(
value * self.lookup[key][age] * self.multiplier
for key,value,age in items
if 0 < age < MAX_VAL
)





share|improve this answer


























  • Have you profiled this?

    – juanpa.arrivillaga
    Dec 31 '18 at 20:52











  • I believe you could use chained comparison, do not know if it will be faster, but is more pythonic

    – Daniel Mesejo
    Dec 31 '18 at 20:52











  • I have not profiled this, just an idea.

    – qxz
    Dec 31 '18 at 20:53











  • Well, sum is generally fast when using it on built-in data-types, generator expressions are typically slow. I bet a for-loop is faster.

    – juanpa.arrivillaga
    Dec 31 '18 at 20:54













  • Maybe replacing the generator expressions with list comprehension would be faster?

    – qxz
    Dec 31 '18 at 21:02



















0














Perhaps consider something like the following-



current_period = self.current_period - (self.current_period % 7)
MIN_VALUE = current_period - 7 * MAX_VALUE
return self.multiplier * sum(value * self.lookup[key][self._get_age(key)]
for key, value in input_dict[self.state][self.city].iteritems()
if MIN_VALUE < key < current_period
)


Here I pull the multiplication by self.multiplier out of the loop, and replace the comparison 0 < age < MAX_VALUE with an equivalent comparison of precomputed values, obtained by substituting age with your _get_age() method described in the comments and solving for key. This allows us to skip the function call + extra computations for cases where age <= 0 or age >= MAX_VALUE, and incurs no extra cost (save for computing the 2 variables outside the loop) over the original if 0 < age < MAX_VALUE. Additionally, this allows us to make use of the builtin sum() function, which is typically faster than summing via a for loop, but without creating a separate generator as in qxz's answer.



Note that I assume (self.current_period - period) in your _get_age() method is an integer, and so / 7 floors the result in Python-2.x. If this is not the case, remove the - (self.current_period % 7) from the current_period assignment for equivalent functionality.






share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53991199%2foptimizing-dictionary-loop-summing-values%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    The built-in sum function is typically faster than writing out a for loop. (See this question.) In your case, you could construct a generator expression of the values to be summed, and then pass that to sum:



    items = (
    (key,value,self._get_age(key))
    for key,value in input_dict[self.state][self.city].iteritems()
    )
    return sum(
    value * self.lookup[key][age] * self.multiplier
    for key,value,age in items
    if 0 < age < MAX_VAL
    )





    share|improve this answer


























    • Have you profiled this?

      – juanpa.arrivillaga
      Dec 31 '18 at 20:52











    • I believe you could use chained comparison, do not know if it will be faster, but is more pythonic

      – Daniel Mesejo
      Dec 31 '18 at 20:52











    • I have not profiled this, just an idea.

      – qxz
      Dec 31 '18 at 20:53











    • Well, sum is generally fast when using it on built-in data-types, generator expressions are typically slow. I bet a for-loop is faster.

      – juanpa.arrivillaga
      Dec 31 '18 at 20:54













    • Maybe replacing the generator expressions with list comprehension would be faster?

      – qxz
      Dec 31 '18 at 21:02
















    0














    The built-in sum function is typically faster than writing out a for loop. (See this question.) In your case, you could construct a generator expression of the values to be summed, and then pass that to sum:



    items = (
    (key,value,self._get_age(key))
    for key,value in input_dict[self.state][self.city].iteritems()
    )
    return sum(
    value * self.lookup[key][age] * self.multiplier
    for key,value,age in items
    if 0 < age < MAX_VAL
    )





    share|improve this answer


























    • Have you profiled this?

      – juanpa.arrivillaga
      Dec 31 '18 at 20:52











    • I believe you could use chained comparison, do not know if it will be faster, but is more pythonic

      – Daniel Mesejo
      Dec 31 '18 at 20:52











    • I have not profiled this, just an idea.

      – qxz
      Dec 31 '18 at 20:53











    • Well, sum is generally fast when using it on built-in data-types, generator expressions are typically slow. I bet a for-loop is faster.

      – juanpa.arrivillaga
      Dec 31 '18 at 20:54













    • Maybe replacing the generator expressions with list comprehension would be faster?

      – qxz
      Dec 31 '18 at 21:02














    0












    0








    0







    The built-in sum function is typically faster than writing out a for loop. (See this question.) In your case, you could construct a generator expression of the values to be summed, and then pass that to sum:



    items = (
    (key,value,self._get_age(key))
    for key,value in input_dict[self.state][self.city].iteritems()
    )
    return sum(
    value * self.lookup[key][age] * self.multiplier
    for key,value,age in items
    if 0 < age < MAX_VAL
    )





    share|improve this answer















    The built-in sum function is typically faster than writing out a for loop. (See this question.) In your case, you could construct a generator expression of the values to be summed, and then pass that to sum:



    items = (
    (key,value,self._get_age(key))
    for key,value in input_dict[self.state][self.city].iteritems()
    )
    return sum(
    value * self.lookup[key][age] * self.multiplier
    for key,value,age in items
    if 0 < age < MAX_VAL
    )






    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Dec 31 '18 at 20:54

























    answered Dec 31 '18 at 20:50









    qxzqxz

    3,2911926




    3,2911926













    • Have you profiled this?

      – juanpa.arrivillaga
      Dec 31 '18 at 20:52











    • I believe you could use chained comparison, do not know if it will be faster, but is more pythonic

      – Daniel Mesejo
      Dec 31 '18 at 20:52











    • I have not profiled this, just an idea.

      – qxz
      Dec 31 '18 at 20:53











    • Well, sum is generally fast when using it on built-in data-types, generator expressions are typically slow. I bet a for-loop is faster.

      – juanpa.arrivillaga
      Dec 31 '18 at 20:54













    • Maybe replacing the generator expressions with list comprehension would be faster?

      – qxz
      Dec 31 '18 at 21:02



















    • Have you profiled this?

      – juanpa.arrivillaga
      Dec 31 '18 at 20:52











    • I believe you could use chained comparison, do not know if it will be faster, but is more pythonic

      – Daniel Mesejo
      Dec 31 '18 at 20:52











    • I have not profiled this, just an idea.

      – qxz
      Dec 31 '18 at 20:53











    • Well, sum is generally fast when using it on built-in data-types, generator expressions are typically slow. I bet a for-loop is faster.

      – juanpa.arrivillaga
      Dec 31 '18 at 20:54













    • Maybe replacing the generator expressions with list comprehension would be faster?

      – qxz
      Dec 31 '18 at 21:02

















    Have you profiled this?

    – juanpa.arrivillaga
    Dec 31 '18 at 20:52





    Have you profiled this?

    – juanpa.arrivillaga
    Dec 31 '18 at 20:52













    I believe you could use chained comparison, do not know if it will be faster, but is more pythonic

    – Daniel Mesejo
    Dec 31 '18 at 20:52





    I believe you could use chained comparison, do not know if it will be faster, but is more pythonic

    – Daniel Mesejo
    Dec 31 '18 at 20:52













    I have not profiled this, just an idea.

    – qxz
    Dec 31 '18 at 20:53





    I have not profiled this, just an idea.

    – qxz
    Dec 31 '18 at 20:53













    Well, sum is generally fast when using it on built-in data-types, generator expressions are typically slow. I bet a for-loop is faster.

    – juanpa.arrivillaga
    Dec 31 '18 at 20:54







    Well, sum is generally fast when using it on built-in data-types, generator expressions are typically slow. I bet a for-loop is faster.

    – juanpa.arrivillaga
    Dec 31 '18 at 20:54















    Maybe replacing the generator expressions with list comprehension would be faster?

    – qxz
    Dec 31 '18 at 21:02





    Maybe replacing the generator expressions with list comprehension would be faster?

    – qxz
    Dec 31 '18 at 21:02













    0














    Perhaps consider something like the following-



    current_period = self.current_period - (self.current_period % 7)
    MIN_VALUE = current_period - 7 * MAX_VALUE
    return self.multiplier * sum(value * self.lookup[key][self._get_age(key)]
    for key, value in input_dict[self.state][self.city].iteritems()
    if MIN_VALUE < key < current_period
    )


    Here I pull the multiplication by self.multiplier out of the loop, and replace the comparison 0 < age < MAX_VALUE with an equivalent comparison of precomputed values, obtained by substituting age with your _get_age() method described in the comments and solving for key. This allows us to skip the function call + extra computations for cases where age <= 0 or age >= MAX_VALUE, and incurs no extra cost (save for computing the 2 variables outside the loop) over the original if 0 < age < MAX_VALUE. Additionally, this allows us to make use of the builtin sum() function, which is typically faster than summing via a for loop, but without creating a separate generator as in qxz's answer.



    Note that I assume (self.current_period - period) in your _get_age() method is an integer, and so / 7 floors the result in Python-2.x. If this is not the case, remove the - (self.current_period % 7) from the current_period assignment for equivalent functionality.






    share|improve this answer




























      0














      Perhaps consider something like the following-



      current_period = self.current_period - (self.current_period % 7)
      MIN_VALUE = current_period - 7 * MAX_VALUE
      return self.multiplier * sum(value * self.lookup[key][self._get_age(key)]
      for key, value in input_dict[self.state][self.city].iteritems()
      if MIN_VALUE < key < current_period
      )


      Here I pull the multiplication by self.multiplier out of the loop, and replace the comparison 0 < age < MAX_VALUE with an equivalent comparison of precomputed values, obtained by substituting age with your _get_age() method described in the comments and solving for key. This allows us to skip the function call + extra computations for cases where age <= 0 or age >= MAX_VALUE, and incurs no extra cost (save for computing the 2 variables outside the loop) over the original if 0 < age < MAX_VALUE. Additionally, this allows us to make use of the builtin sum() function, which is typically faster than summing via a for loop, but without creating a separate generator as in qxz's answer.



      Note that I assume (self.current_period - period) in your _get_age() method is an integer, and so / 7 floors the result in Python-2.x. If this is not the case, remove the - (self.current_period % 7) from the current_period assignment for equivalent functionality.






      share|improve this answer


























        0












        0








        0







        Perhaps consider something like the following-



        current_period = self.current_period - (self.current_period % 7)
        MIN_VALUE = current_period - 7 * MAX_VALUE
        return self.multiplier * sum(value * self.lookup[key][self._get_age(key)]
        for key, value in input_dict[self.state][self.city].iteritems()
        if MIN_VALUE < key < current_period
        )


        Here I pull the multiplication by self.multiplier out of the loop, and replace the comparison 0 < age < MAX_VALUE with an equivalent comparison of precomputed values, obtained by substituting age with your _get_age() method described in the comments and solving for key. This allows us to skip the function call + extra computations for cases where age <= 0 or age >= MAX_VALUE, and incurs no extra cost (save for computing the 2 variables outside the loop) over the original if 0 < age < MAX_VALUE. Additionally, this allows us to make use of the builtin sum() function, which is typically faster than summing via a for loop, but without creating a separate generator as in qxz's answer.



        Note that I assume (self.current_period - period) in your _get_age() method is an integer, and so / 7 floors the result in Python-2.x. If this is not the case, remove the - (self.current_period % 7) from the current_period assignment for equivalent functionality.






        share|improve this answer













        Perhaps consider something like the following-



        current_period = self.current_period - (self.current_period % 7)
        MIN_VALUE = current_period - 7 * MAX_VALUE
        return self.multiplier * sum(value * self.lookup[key][self._get_age(key)]
        for key, value in input_dict[self.state][self.city].iteritems()
        if MIN_VALUE < key < current_period
        )


        Here I pull the multiplication by self.multiplier out of the loop, and replace the comparison 0 < age < MAX_VALUE with an equivalent comparison of precomputed values, obtained by substituting age with your _get_age() method described in the comments and solving for key. This allows us to skip the function call + extra computations for cases where age <= 0 or age >= MAX_VALUE, and incurs no extra cost (save for computing the 2 variables outside the loop) over the original if 0 < age < MAX_VALUE. Additionally, this allows us to make use of the builtin sum() function, which is typically faster than summing via a for loop, but without creating a separate generator as in qxz's answer.



        Note that I assume (self.current_period - period) in your _get_age() method is an integer, and so / 7 floors the result in Python-2.x. If this is not the case, remove the - (self.current_period % 7) from the current_period assignment for equivalent functionality.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Jan 1 at 20:59









        Dillon DavisDillon Davis

        3,0611626




        3,0611626






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53991199%2foptimizing-dictionary-loop-summing-values%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            MongoDB - Not Authorized To Execute Command

            How to fix TextFormField cause rebuild widget in Flutter

            Npm cannot find a required file even through it is in the searched directory