Using itertools.tee to duplicate a nested iterator (ie itertools.groupby)
I'm reading a file (while doing some expensive logic) that I will need to iterate several times in different functions, so I really want to read and parse the file only once.
The parsing function parses the file and returns an itertools.groupby
object.
def parse_file():
...
return itertools.groupby(lines, key=keyfunc)
I thought about doing the following:
csv_file_content = read_csv_file()
file_content_1, file_content_2 = itertools.tee(csv_file_content, 2)
foo(file_content_1)
bar(file_content_2)
However, itertools.tee
seems to only be able to "duplicate" the external iterator, while the internal (nested) iterator still refers to the original (hence it will be exhausted after iterating over the 1st iterator returned by itertools.tee
).
Standalone MCVE:
from itertools import groupby, tee
li = [{'name': 'a', 'id': 1},
{'name': 'a', 'id': 2},
{'name': 'b', 'id': 3},
{'name': 'b', 'id': 4},
{'name': 'c', 'id': 5},
{'name': 'c', 'id': 6}]
groupby_obj = groupby(li, key=lambda x:x['name'])
tee_obj1, tee_obj2 = tee(groupby_obj, 2)
print(id(tee_obj1))
for group, data in tee_obj1:
print(group)
print(id(data))
for i in data:
print(i)
print('----')
print(id(tee_obj2))
for group, data in tee_obj2:
print(group)
print(id(data))
for i in data:
print(i)
Outputs
2380054450440
a
2380053623136
{'name': 'a', 'id': 1}
{'name': 'a', 'id': 2}
b
2380030915976
{'name': 'b', 'id': 3}
{'name': 'b', 'id': 4}
c
2380054184344
{'name': 'c', 'id': 5}
{'name': 'c', 'id': 6}
----
2380064387336
a
2380053623136 # same ID as above
b
2380030915976 # same ID as above
c
2380054184344 # same ID as above
How can we efficiently duplicate a nested iterator?
python iterator itertools
add a comment |
I'm reading a file (while doing some expensive logic) that I will need to iterate several times in different functions, so I really want to read and parse the file only once.
The parsing function parses the file and returns an itertools.groupby
object.
def parse_file():
...
return itertools.groupby(lines, key=keyfunc)
I thought about doing the following:
csv_file_content = read_csv_file()
file_content_1, file_content_2 = itertools.tee(csv_file_content, 2)
foo(file_content_1)
bar(file_content_2)
However, itertools.tee
seems to only be able to "duplicate" the external iterator, while the internal (nested) iterator still refers to the original (hence it will be exhausted after iterating over the 1st iterator returned by itertools.tee
).
Standalone MCVE:
from itertools import groupby, tee
li = [{'name': 'a', 'id': 1},
{'name': 'a', 'id': 2},
{'name': 'b', 'id': 3},
{'name': 'b', 'id': 4},
{'name': 'c', 'id': 5},
{'name': 'c', 'id': 6}]
groupby_obj = groupby(li, key=lambda x:x['name'])
tee_obj1, tee_obj2 = tee(groupby_obj, 2)
print(id(tee_obj1))
for group, data in tee_obj1:
print(group)
print(id(data))
for i in data:
print(i)
print('----')
print(id(tee_obj2))
for group, data in tee_obj2:
print(group)
print(id(data))
for i in data:
print(i)
Outputs
2380054450440
a
2380053623136
{'name': 'a', 'id': 1}
{'name': 'a', 'id': 2}
b
2380030915976
{'name': 'b', 'id': 3}
{'name': 'b', 'id': 4}
c
2380054184344
{'name': 'c', 'id': 5}
{'name': 'c', 'id': 6}
----
2380064387336
a
2380053623136 # same ID as above
b
2380030915976 # same ID as above
c
2380054184344 # same ID as above
How can we efficiently duplicate a nested iterator?
python iterator itertools
1
But if you tee the inner iterator, wouldn't you be reading the file twice?
– Daniel Mesejo
Jan 1 at 10:20
you'd probably be better off by hardcoding everything into lists.
– Jean-François Fabre
Jan 1 at 10:23
1
It seemsgrouped_object
even intee
can not be used twice. This parallel doesn't work:tee_obj1, tee_obj2 = groupby_obj, groupby_obj
. But I guess this gives the expected result:tee_obj1, tee_obj2 = copy.deepcopy(groupby_obj), groupby_obj
. I guess..
– iGian
Jan 1 at 10:30
"how to recursively copy an iterator" cannot be properly answered (or there is no solution) as discussed here stackoverflow.com/questions/42132731/… but in your case it seems thatdeepcopy
solves it.
– Jean-François Fabre
Jan 1 at 10:46
add a comment |
I'm reading a file (while doing some expensive logic) that I will need to iterate several times in different functions, so I really want to read and parse the file only once.
The parsing function parses the file and returns an itertools.groupby
object.
def parse_file():
...
return itertools.groupby(lines, key=keyfunc)
I thought about doing the following:
csv_file_content = read_csv_file()
file_content_1, file_content_2 = itertools.tee(csv_file_content, 2)
foo(file_content_1)
bar(file_content_2)
However, itertools.tee
seems to only be able to "duplicate" the external iterator, while the internal (nested) iterator still refers to the original (hence it will be exhausted after iterating over the 1st iterator returned by itertools.tee
).
Standalone MCVE:
from itertools import groupby, tee
li = [{'name': 'a', 'id': 1},
{'name': 'a', 'id': 2},
{'name': 'b', 'id': 3},
{'name': 'b', 'id': 4},
{'name': 'c', 'id': 5},
{'name': 'c', 'id': 6}]
groupby_obj = groupby(li, key=lambda x:x['name'])
tee_obj1, tee_obj2 = tee(groupby_obj, 2)
print(id(tee_obj1))
for group, data in tee_obj1:
print(group)
print(id(data))
for i in data:
print(i)
print('----')
print(id(tee_obj2))
for group, data in tee_obj2:
print(group)
print(id(data))
for i in data:
print(i)
Outputs
2380054450440
a
2380053623136
{'name': 'a', 'id': 1}
{'name': 'a', 'id': 2}
b
2380030915976
{'name': 'b', 'id': 3}
{'name': 'b', 'id': 4}
c
2380054184344
{'name': 'c', 'id': 5}
{'name': 'c', 'id': 6}
----
2380064387336
a
2380053623136 # same ID as above
b
2380030915976 # same ID as above
c
2380054184344 # same ID as above
How can we efficiently duplicate a nested iterator?
python iterator itertools
I'm reading a file (while doing some expensive logic) that I will need to iterate several times in different functions, so I really want to read and parse the file only once.
The parsing function parses the file and returns an itertools.groupby
object.
def parse_file():
...
return itertools.groupby(lines, key=keyfunc)
I thought about doing the following:
csv_file_content = read_csv_file()
file_content_1, file_content_2 = itertools.tee(csv_file_content, 2)
foo(file_content_1)
bar(file_content_2)
However, itertools.tee
seems to only be able to "duplicate" the external iterator, while the internal (nested) iterator still refers to the original (hence it will be exhausted after iterating over the 1st iterator returned by itertools.tee
).
Standalone MCVE:
from itertools import groupby, tee
li = [{'name': 'a', 'id': 1},
{'name': 'a', 'id': 2},
{'name': 'b', 'id': 3},
{'name': 'b', 'id': 4},
{'name': 'c', 'id': 5},
{'name': 'c', 'id': 6}]
groupby_obj = groupby(li, key=lambda x:x['name'])
tee_obj1, tee_obj2 = tee(groupby_obj, 2)
print(id(tee_obj1))
for group, data in tee_obj1:
print(group)
print(id(data))
for i in data:
print(i)
print('----')
print(id(tee_obj2))
for group, data in tee_obj2:
print(group)
print(id(data))
for i in data:
print(i)
Outputs
2380054450440
a
2380053623136
{'name': 'a', 'id': 1}
{'name': 'a', 'id': 2}
b
2380030915976
{'name': 'b', 'id': 3}
{'name': 'b', 'id': 4}
c
2380054184344
{'name': 'c', 'id': 5}
{'name': 'c', 'id': 6}
----
2380064387336
a
2380053623136 # same ID as above
b
2380030915976 # same ID as above
c
2380054184344 # same ID as above
How can we efficiently duplicate a nested iterator?
python iterator itertools
python iterator itertools
edited Jan 1 at 10:07
DeepSpace
asked Jan 1 at 9:59
DeepSpaceDeepSpace
39.3k44777
39.3k44777
1
But if you tee the inner iterator, wouldn't you be reading the file twice?
– Daniel Mesejo
Jan 1 at 10:20
you'd probably be better off by hardcoding everything into lists.
– Jean-François Fabre
Jan 1 at 10:23
1
It seemsgrouped_object
even intee
can not be used twice. This parallel doesn't work:tee_obj1, tee_obj2 = groupby_obj, groupby_obj
. But I guess this gives the expected result:tee_obj1, tee_obj2 = copy.deepcopy(groupby_obj), groupby_obj
. I guess..
– iGian
Jan 1 at 10:30
"how to recursively copy an iterator" cannot be properly answered (or there is no solution) as discussed here stackoverflow.com/questions/42132731/… but in your case it seems thatdeepcopy
solves it.
– Jean-François Fabre
Jan 1 at 10:46
add a comment |
1
But if you tee the inner iterator, wouldn't you be reading the file twice?
– Daniel Mesejo
Jan 1 at 10:20
you'd probably be better off by hardcoding everything into lists.
– Jean-François Fabre
Jan 1 at 10:23
1
It seemsgrouped_object
even intee
can not be used twice. This parallel doesn't work:tee_obj1, tee_obj2 = groupby_obj, groupby_obj
. But I guess this gives the expected result:tee_obj1, tee_obj2 = copy.deepcopy(groupby_obj), groupby_obj
. I guess..
– iGian
Jan 1 at 10:30
"how to recursively copy an iterator" cannot be properly answered (or there is no solution) as discussed here stackoverflow.com/questions/42132731/… but in your case it seems thatdeepcopy
solves it.
– Jean-François Fabre
Jan 1 at 10:46
1
1
But if you tee the inner iterator, wouldn't you be reading the file twice?
– Daniel Mesejo
Jan 1 at 10:20
But if you tee the inner iterator, wouldn't you be reading the file twice?
– Daniel Mesejo
Jan 1 at 10:20
you'd probably be better off by hardcoding everything into lists.
– Jean-François Fabre
Jan 1 at 10:23
you'd probably be better off by hardcoding everything into lists.
– Jean-François Fabre
Jan 1 at 10:23
1
1
It seems
grouped_object
even in tee
can not be used twice. This parallel doesn't work: tee_obj1, tee_obj2 = groupby_obj, groupby_obj
. But I guess this gives the expected result: tee_obj1, tee_obj2 = copy.deepcopy(groupby_obj), groupby_obj
. I guess..– iGian
Jan 1 at 10:30
It seems
grouped_object
even in tee
can not be used twice. This parallel doesn't work: tee_obj1, tee_obj2 = groupby_obj, groupby_obj
. But I guess this gives the expected result: tee_obj1, tee_obj2 = copy.deepcopy(groupby_obj), groupby_obj
. I guess..– iGian
Jan 1 at 10:30
"how to recursively copy an iterator" cannot be properly answered (or there is no solution) as discussed here stackoverflow.com/questions/42132731/… but in your case it seems that
deepcopy
solves it.– Jean-François Fabre
Jan 1 at 10:46
"how to recursively copy an iterator" cannot be properly answered (or there is no solution) as discussed here stackoverflow.com/questions/42132731/… but in your case it seems that
deepcopy
solves it.– Jean-François Fabre
Jan 1 at 10:46
add a comment |
1 Answer
1
active
oldest
votes
It seems like grouped_object
(class 'itertools.groupby
') be consumed once, even in itertools.tee
.
Also parallel assignement of the same grouped_object
doesn't work:
tee_obj1, tee_obj2 = groupby_obj, groupby_obj
What's working is a deep copy of the grouped_object
:
tee_obj1, tee_obj2 = copy.deepcopy(groupby_obj), groupby_obj
2
"It seems like grouped_objectct (class 'itertools.groupby') be consumed once, even in itertools.tee" I don't think this is true, otherwisea b c
would not have been outputted the second time. I'll accept this answer though I was hoping to use something more elegant thandeepcopy
. Thanks!
– DeepSpace
Jan 1 at 11:12
what is true is that the grouper objects returned as "values" of each groupby iteration cannot be tee'd.
– Jean-François Fabre
Jan 1 at 15:20
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53994545%2fusing-itertools-tee-to-duplicate-a-nested-iterator-ie-itertools-groupby%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
It seems like grouped_object
(class 'itertools.groupby
') be consumed once, even in itertools.tee
.
Also parallel assignement of the same grouped_object
doesn't work:
tee_obj1, tee_obj2 = groupby_obj, groupby_obj
What's working is a deep copy of the grouped_object
:
tee_obj1, tee_obj2 = copy.deepcopy(groupby_obj), groupby_obj
2
"It seems like grouped_objectct (class 'itertools.groupby') be consumed once, even in itertools.tee" I don't think this is true, otherwisea b c
would not have been outputted the second time. I'll accept this answer though I was hoping to use something more elegant thandeepcopy
. Thanks!
– DeepSpace
Jan 1 at 11:12
what is true is that the grouper objects returned as "values" of each groupby iteration cannot be tee'd.
– Jean-François Fabre
Jan 1 at 15:20
add a comment |
It seems like grouped_object
(class 'itertools.groupby
') be consumed once, even in itertools.tee
.
Also parallel assignement of the same grouped_object
doesn't work:
tee_obj1, tee_obj2 = groupby_obj, groupby_obj
What's working is a deep copy of the grouped_object
:
tee_obj1, tee_obj2 = copy.deepcopy(groupby_obj), groupby_obj
2
"It seems like grouped_objectct (class 'itertools.groupby') be consumed once, even in itertools.tee" I don't think this is true, otherwisea b c
would not have been outputted the second time. I'll accept this answer though I was hoping to use something more elegant thandeepcopy
. Thanks!
– DeepSpace
Jan 1 at 11:12
what is true is that the grouper objects returned as "values" of each groupby iteration cannot be tee'd.
– Jean-François Fabre
Jan 1 at 15:20
add a comment |
It seems like grouped_object
(class 'itertools.groupby
') be consumed once, even in itertools.tee
.
Also parallel assignement of the same grouped_object
doesn't work:
tee_obj1, tee_obj2 = groupby_obj, groupby_obj
What's working is a deep copy of the grouped_object
:
tee_obj1, tee_obj2 = copy.deepcopy(groupby_obj), groupby_obj
It seems like grouped_object
(class 'itertools.groupby
') be consumed once, even in itertools.tee
.
Also parallel assignement of the same grouped_object
doesn't work:
tee_obj1, tee_obj2 = groupby_obj, groupby_obj
What's working is a deep copy of the grouped_object
:
tee_obj1, tee_obj2 = copy.deepcopy(groupby_obj), groupby_obj
edited Jan 1 at 11:15
DeepSpace
39.3k44777
39.3k44777
answered Jan 1 at 10:54
iGianiGian
4,4842625
4,4842625
2
"It seems like grouped_objectct (class 'itertools.groupby') be consumed once, even in itertools.tee" I don't think this is true, otherwisea b c
would not have been outputted the second time. I'll accept this answer though I was hoping to use something more elegant thandeepcopy
. Thanks!
– DeepSpace
Jan 1 at 11:12
what is true is that the grouper objects returned as "values" of each groupby iteration cannot be tee'd.
– Jean-François Fabre
Jan 1 at 15:20
add a comment |
2
"It seems like grouped_objectct (class 'itertools.groupby') be consumed once, even in itertools.tee" I don't think this is true, otherwisea b c
would not have been outputted the second time. I'll accept this answer though I was hoping to use something more elegant thandeepcopy
. Thanks!
– DeepSpace
Jan 1 at 11:12
what is true is that the grouper objects returned as "values" of each groupby iteration cannot be tee'd.
– Jean-François Fabre
Jan 1 at 15:20
2
2
"It seems like grouped_objectct (class 'itertools.groupby') be consumed once, even in itertools.tee" I don't think this is true, otherwise
a b c
would not have been outputted the second time. I'll accept this answer though I was hoping to use something more elegant than deepcopy
. Thanks!– DeepSpace
Jan 1 at 11:12
"It seems like grouped_objectct (class 'itertools.groupby') be consumed once, even in itertools.tee" I don't think this is true, otherwise
a b c
would not have been outputted the second time. I'll accept this answer though I was hoping to use something more elegant than deepcopy
. Thanks!– DeepSpace
Jan 1 at 11:12
what is true is that the grouper objects returned as "values" of each groupby iteration cannot be tee'd.
– Jean-François Fabre
Jan 1 at 15:20
what is true is that the grouper objects returned as "values" of each groupby iteration cannot be tee'd.
– Jean-François Fabre
Jan 1 at 15:20
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53994545%2fusing-itertools-tee-to-duplicate-a-nested-iterator-ie-itertools-groupby%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
But if you tee the inner iterator, wouldn't you be reading the file twice?
– Daniel Mesejo
Jan 1 at 10:20
you'd probably be better off by hardcoding everything into lists.
– Jean-François Fabre
Jan 1 at 10:23
1
It seems
grouped_object
even intee
can not be used twice. This parallel doesn't work:tee_obj1, tee_obj2 = groupby_obj, groupby_obj
. But I guess this gives the expected result:tee_obj1, tee_obj2 = copy.deepcopy(groupby_obj), groupby_obj
. I guess..– iGian
Jan 1 at 10:30
"how to recursively copy an iterator" cannot be properly answered (or there is no solution) as discussed here stackoverflow.com/questions/42132731/… but in your case it seems that
deepcopy
solves it.– Jean-François Fabre
Jan 1 at 10:46