Optimizing dictionary loop summing values
I have a method that takes a nested dict input_dict
final = 0
for key, value in input_dict[self.state][self.city].iteritems():
age = self._get_age(key)
if (age > 0 and age < MAX_VAL):
final += value * self.lookup[key][age] * self.multiplier
return final
This runs in around .03 seconds, but given that in an example execution it gets called >10k times, it winds up being bottleneck and is responsible for about 50% of the runtime. Assuming I can't reduce the total number of times the method is called, does anyone have suggestions on how to improve this?
python algorithm performance optimization python-2.x
|
show 2 more comments
I have a method that takes a nested dict input_dict
final = 0
for key, value in input_dict[self.state][self.city].iteritems():
age = self._get_age(key)
if (age > 0 and age < MAX_VAL):
final += value * self.lookup[key][age] * self.multiplier
return final
This runs in around .03 seconds, but given that in an example execution it gets called >10k times, it winds up being bottleneck and is responsible for about 50% of the runtime. Assuming I can't reduce the total number of times the method is called, does anyone have suggestions on how to improve this?
python algorithm performance optimization python-2.x
1
correct. This is 2.7
– ARB
Dec 31 '18 at 20:32
What doesself._get_age(key)
does?
– Daniel Mesejo
Dec 31 '18 at 20:33
Takes a dateperiod
:return (self.current_period - period) / 7
– ARB
Dec 31 '18 at 20:35
Where is period the key?
– Daniel Mesejo
Dec 31 '18 at 20:36
input_dict
has keys which are dates, so each key is the argument toself._get_age(key)
– ARB
Dec 31 '18 at 20:38
|
show 2 more comments
I have a method that takes a nested dict input_dict
final = 0
for key, value in input_dict[self.state][self.city].iteritems():
age = self._get_age(key)
if (age > 0 and age < MAX_VAL):
final += value * self.lookup[key][age] * self.multiplier
return final
This runs in around .03 seconds, but given that in an example execution it gets called >10k times, it winds up being bottleneck and is responsible for about 50% of the runtime. Assuming I can't reduce the total number of times the method is called, does anyone have suggestions on how to improve this?
python algorithm performance optimization python-2.x
I have a method that takes a nested dict input_dict
final = 0
for key, value in input_dict[self.state][self.city].iteritems():
age = self._get_age(key)
if (age > 0 and age < MAX_VAL):
final += value * self.lookup[key][age] * self.multiplier
return final
This runs in around .03 seconds, but given that in an example execution it gets called >10k times, it winds up being bottleneck and is responsible for about 50% of the runtime. Assuming I can't reduce the total number of times the method is called, does anyone have suggestions on how to improve this?
python algorithm performance optimization python-2.x
python algorithm performance optimization python-2.x
edited Dec 31 '18 at 20:32


Daniel Mesejo
18.7k21433
18.7k21433
asked Dec 31 '18 at 20:30
ARBARB
1
1
1
correct. This is 2.7
– ARB
Dec 31 '18 at 20:32
What doesself._get_age(key)
does?
– Daniel Mesejo
Dec 31 '18 at 20:33
Takes a dateperiod
:return (self.current_period - period) / 7
– ARB
Dec 31 '18 at 20:35
Where is period the key?
– Daniel Mesejo
Dec 31 '18 at 20:36
input_dict
has keys which are dates, so each key is the argument toself._get_age(key)
– ARB
Dec 31 '18 at 20:38
|
show 2 more comments
1
correct. This is 2.7
– ARB
Dec 31 '18 at 20:32
What doesself._get_age(key)
does?
– Daniel Mesejo
Dec 31 '18 at 20:33
Takes a dateperiod
:return (self.current_period - period) / 7
– ARB
Dec 31 '18 at 20:35
Where is period the key?
– Daniel Mesejo
Dec 31 '18 at 20:36
input_dict
has keys which are dates, so each key is the argument toself._get_age(key)
– ARB
Dec 31 '18 at 20:38
1
1
correct. This is 2.7
– ARB
Dec 31 '18 at 20:32
correct. This is 2.7
– ARB
Dec 31 '18 at 20:32
What does
self._get_age(key)
does?– Daniel Mesejo
Dec 31 '18 at 20:33
What does
self._get_age(key)
does?– Daniel Mesejo
Dec 31 '18 at 20:33
Takes a date
period
: return (self.current_period - period) / 7
– ARB
Dec 31 '18 at 20:35
Takes a date
period
: return (self.current_period - period) / 7
– ARB
Dec 31 '18 at 20:35
Where is period the key?
– Daniel Mesejo
Dec 31 '18 at 20:36
Where is period the key?
– Daniel Mesejo
Dec 31 '18 at 20:36
input_dict
has keys which are dates, so each key is the argument to self._get_age(key)
– ARB
Dec 31 '18 at 20:38
input_dict
has keys which are dates, so each key is the argument to self._get_age(key)
– ARB
Dec 31 '18 at 20:38
|
show 2 more comments
2 Answers
2
active
oldest
votes
The built-in sum
function is typically faster than writing out a for
loop. (See this question.) In your case, you could construct a generator expression of the values to be summed, and then pass that to sum
:
items = (
(key,value,self._get_age(key))
for key,value in input_dict[self.state][self.city].iteritems()
)
return sum(
value * self.lookup[key][age] * self.multiplier
for key,value,age in items
if 0 < age < MAX_VAL
)
Have you profiled this?
– juanpa.arrivillaga
Dec 31 '18 at 20:52
I believe you could use chained comparison, do not know if it will be faster, but is more pythonic
– Daniel Mesejo
Dec 31 '18 at 20:52
I have not profiled this, just an idea.
– qxz
Dec 31 '18 at 20:53
Well,sum
is generally fast when using it on built-in data-types, generator expressions are typically slow. I bet a for-loop is faster.
– juanpa.arrivillaga
Dec 31 '18 at 20:54
Maybe replacing the generator expressions with list comprehension would be faster?
– qxz
Dec 31 '18 at 21:02
|
show 1 more comment
Perhaps consider something like the following-
current_period = self.current_period - (self.current_period % 7)
MIN_VALUE = current_period - 7 * MAX_VALUE
return self.multiplier * sum(value * self.lookup[key][self._get_age(key)]
for key, value in input_dict[self.state][self.city].iteritems()
if MIN_VALUE < key < current_period
)
Here I pull the multiplication by self.multiplier
out of the loop, and replace the comparison 0 < age < MAX_VALUE
with an equivalent comparison of precomputed values, obtained by substituting age
with your _get_age()
method described in the comments and solving for key
. This allows us to skip the function call + extra computations for cases where age <= 0 or age >= MAX_VALUE
, and incurs no extra cost (save for computing the 2 variables outside the loop) over the original if 0 < age < MAX_VALUE
. Additionally, this allows us to make use of the builtin sum()
function, which is typically faster than summing via a for loop, but without creating a separate generator as in qxz's answer.
Note that I assume (self.current_period - period)
in your _get_age()
method is an integer, and so / 7
floors the result in Python-2.x. If this is not the case, remove the - (self.current_period % 7)
from the current_period
assignment for equivalent functionality.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53991199%2foptimizing-dictionary-loop-summing-values%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
The built-in sum
function is typically faster than writing out a for
loop. (See this question.) In your case, you could construct a generator expression of the values to be summed, and then pass that to sum
:
items = (
(key,value,self._get_age(key))
for key,value in input_dict[self.state][self.city].iteritems()
)
return sum(
value * self.lookup[key][age] * self.multiplier
for key,value,age in items
if 0 < age < MAX_VAL
)
Have you profiled this?
– juanpa.arrivillaga
Dec 31 '18 at 20:52
I believe you could use chained comparison, do not know if it will be faster, but is more pythonic
– Daniel Mesejo
Dec 31 '18 at 20:52
I have not profiled this, just an idea.
– qxz
Dec 31 '18 at 20:53
Well,sum
is generally fast when using it on built-in data-types, generator expressions are typically slow. I bet a for-loop is faster.
– juanpa.arrivillaga
Dec 31 '18 at 20:54
Maybe replacing the generator expressions with list comprehension would be faster?
– qxz
Dec 31 '18 at 21:02
|
show 1 more comment
The built-in sum
function is typically faster than writing out a for
loop. (See this question.) In your case, you could construct a generator expression of the values to be summed, and then pass that to sum
:
items = (
(key,value,self._get_age(key))
for key,value in input_dict[self.state][self.city].iteritems()
)
return sum(
value * self.lookup[key][age] * self.multiplier
for key,value,age in items
if 0 < age < MAX_VAL
)
Have you profiled this?
– juanpa.arrivillaga
Dec 31 '18 at 20:52
I believe you could use chained comparison, do not know if it will be faster, but is more pythonic
– Daniel Mesejo
Dec 31 '18 at 20:52
I have not profiled this, just an idea.
– qxz
Dec 31 '18 at 20:53
Well,sum
is generally fast when using it on built-in data-types, generator expressions are typically slow. I bet a for-loop is faster.
– juanpa.arrivillaga
Dec 31 '18 at 20:54
Maybe replacing the generator expressions with list comprehension would be faster?
– qxz
Dec 31 '18 at 21:02
|
show 1 more comment
The built-in sum
function is typically faster than writing out a for
loop. (See this question.) In your case, you could construct a generator expression of the values to be summed, and then pass that to sum
:
items = (
(key,value,self._get_age(key))
for key,value in input_dict[self.state][self.city].iteritems()
)
return sum(
value * self.lookup[key][age] * self.multiplier
for key,value,age in items
if 0 < age < MAX_VAL
)
The built-in sum
function is typically faster than writing out a for
loop. (See this question.) In your case, you could construct a generator expression of the values to be summed, and then pass that to sum
:
items = (
(key,value,self._get_age(key))
for key,value in input_dict[self.state][self.city].iteritems()
)
return sum(
value * self.lookup[key][age] * self.multiplier
for key,value,age in items
if 0 < age < MAX_VAL
)
edited Dec 31 '18 at 20:54
answered Dec 31 '18 at 20:50
qxzqxz
3,2911926
3,2911926
Have you profiled this?
– juanpa.arrivillaga
Dec 31 '18 at 20:52
I believe you could use chained comparison, do not know if it will be faster, but is more pythonic
– Daniel Mesejo
Dec 31 '18 at 20:52
I have not profiled this, just an idea.
– qxz
Dec 31 '18 at 20:53
Well,sum
is generally fast when using it on built-in data-types, generator expressions are typically slow. I bet a for-loop is faster.
– juanpa.arrivillaga
Dec 31 '18 at 20:54
Maybe replacing the generator expressions with list comprehension would be faster?
– qxz
Dec 31 '18 at 21:02
|
show 1 more comment
Have you profiled this?
– juanpa.arrivillaga
Dec 31 '18 at 20:52
I believe you could use chained comparison, do not know if it will be faster, but is more pythonic
– Daniel Mesejo
Dec 31 '18 at 20:52
I have not profiled this, just an idea.
– qxz
Dec 31 '18 at 20:53
Well,sum
is generally fast when using it on built-in data-types, generator expressions are typically slow. I bet a for-loop is faster.
– juanpa.arrivillaga
Dec 31 '18 at 20:54
Maybe replacing the generator expressions with list comprehension would be faster?
– qxz
Dec 31 '18 at 21:02
Have you profiled this?
– juanpa.arrivillaga
Dec 31 '18 at 20:52
Have you profiled this?
– juanpa.arrivillaga
Dec 31 '18 at 20:52
I believe you could use chained comparison, do not know if it will be faster, but is more pythonic
– Daniel Mesejo
Dec 31 '18 at 20:52
I believe you could use chained comparison, do not know if it will be faster, but is more pythonic
– Daniel Mesejo
Dec 31 '18 at 20:52
I have not profiled this, just an idea.
– qxz
Dec 31 '18 at 20:53
I have not profiled this, just an idea.
– qxz
Dec 31 '18 at 20:53
Well,
sum
is generally fast when using it on built-in data-types, generator expressions are typically slow. I bet a for-loop is faster.– juanpa.arrivillaga
Dec 31 '18 at 20:54
Well,
sum
is generally fast when using it on built-in data-types, generator expressions are typically slow. I bet a for-loop is faster.– juanpa.arrivillaga
Dec 31 '18 at 20:54
Maybe replacing the generator expressions with list comprehension would be faster?
– qxz
Dec 31 '18 at 21:02
Maybe replacing the generator expressions with list comprehension would be faster?
– qxz
Dec 31 '18 at 21:02
|
show 1 more comment
Perhaps consider something like the following-
current_period = self.current_period - (self.current_period % 7)
MIN_VALUE = current_period - 7 * MAX_VALUE
return self.multiplier * sum(value * self.lookup[key][self._get_age(key)]
for key, value in input_dict[self.state][self.city].iteritems()
if MIN_VALUE < key < current_period
)
Here I pull the multiplication by self.multiplier
out of the loop, and replace the comparison 0 < age < MAX_VALUE
with an equivalent comparison of precomputed values, obtained by substituting age
with your _get_age()
method described in the comments and solving for key
. This allows us to skip the function call + extra computations for cases where age <= 0 or age >= MAX_VALUE
, and incurs no extra cost (save for computing the 2 variables outside the loop) over the original if 0 < age < MAX_VALUE
. Additionally, this allows us to make use of the builtin sum()
function, which is typically faster than summing via a for loop, but without creating a separate generator as in qxz's answer.
Note that I assume (self.current_period - period)
in your _get_age()
method is an integer, and so / 7
floors the result in Python-2.x. If this is not the case, remove the - (self.current_period % 7)
from the current_period
assignment for equivalent functionality.
add a comment |
Perhaps consider something like the following-
current_period = self.current_period - (self.current_period % 7)
MIN_VALUE = current_period - 7 * MAX_VALUE
return self.multiplier * sum(value * self.lookup[key][self._get_age(key)]
for key, value in input_dict[self.state][self.city].iteritems()
if MIN_VALUE < key < current_period
)
Here I pull the multiplication by self.multiplier
out of the loop, and replace the comparison 0 < age < MAX_VALUE
with an equivalent comparison of precomputed values, obtained by substituting age
with your _get_age()
method described in the comments and solving for key
. This allows us to skip the function call + extra computations for cases where age <= 0 or age >= MAX_VALUE
, and incurs no extra cost (save for computing the 2 variables outside the loop) over the original if 0 < age < MAX_VALUE
. Additionally, this allows us to make use of the builtin sum()
function, which is typically faster than summing via a for loop, but without creating a separate generator as in qxz's answer.
Note that I assume (self.current_period - period)
in your _get_age()
method is an integer, and so / 7
floors the result in Python-2.x. If this is not the case, remove the - (self.current_period % 7)
from the current_period
assignment for equivalent functionality.
add a comment |
Perhaps consider something like the following-
current_period = self.current_period - (self.current_period % 7)
MIN_VALUE = current_period - 7 * MAX_VALUE
return self.multiplier * sum(value * self.lookup[key][self._get_age(key)]
for key, value in input_dict[self.state][self.city].iteritems()
if MIN_VALUE < key < current_period
)
Here I pull the multiplication by self.multiplier
out of the loop, and replace the comparison 0 < age < MAX_VALUE
with an equivalent comparison of precomputed values, obtained by substituting age
with your _get_age()
method described in the comments and solving for key
. This allows us to skip the function call + extra computations for cases where age <= 0 or age >= MAX_VALUE
, and incurs no extra cost (save for computing the 2 variables outside the loop) over the original if 0 < age < MAX_VALUE
. Additionally, this allows us to make use of the builtin sum()
function, which is typically faster than summing via a for loop, but without creating a separate generator as in qxz's answer.
Note that I assume (self.current_period - period)
in your _get_age()
method is an integer, and so / 7
floors the result in Python-2.x. If this is not the case, remove the - (self.current_period % 7)
from the current_period
assignment for equivalent functionality.
Perhaps consider something like the following-
current_period = self.current_period - (self.current_period % 7)
MIN_VALUE = current_period - 7 * MAX_VALUE
return self.multiplier * sum(value * self.lookup[key][self._get_age(key)]
for key, value in input_dict[self.state][self.city].iteritems()
if MIN_VALUE < key < current_period
)
Here I pull the multiplication by self.multiplier
out of the loop, and replace the comparison 0 < age < MAX_VALUE
with an equivalent comparison of precomputed values, obtained by substituting age
with your _get_age()
method described in the comments and solving for key
. This allows us to skip the function call + extra computations for cases where age <= 0 or age >= MAX_VALUE
, and incurs no extra cost (save for computing the 2 variables outside the loop) over the original if 0 < age < MAX_VALUE
. Additionally, this allows us to make use of the builtin sum()
function, which is typically faster than summing via a for loop, but without creating a separate generator as in qxz's answer.
Note that I assume (self.current_period - period)
in your _get_age()
method is an integer, and so / 7
floors the result in Python-2.x. If this is not the case, remove the - (self.current_period % 7)
from the current_period
assignment for equivalent functionality.
answered Jan 1 at 20:59


Dillon DavisDillon Davis
3,0611626
3,0611626
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53991199%2foptimizing-dictionary-loop-summing-values%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
correct. This is 2.7
– ARB
Dec 31 '18 at 20:32
What does
self._get_age(key)
does?– Daniel Mesejo
Dec 31 '18 at 20:33
Takes a date
period
:return (self.current_period - period) / 7
– ARB
Dec 31 '18 at 20:35
Where is period the key?
– Daniel Mesejo
Dec 31 '18 at 20:36
input_dict
has keys which are dates, so each key is the argument toself._get_age(key)
– ARB
Dec 31 '18 at 20:38