Using itertools.tee to duplicate a nested iterator (ie itertools.groupby)












5















I'm reading a file (while doing some expensive logic) that I will need to iterate several times in different functions, so I really want to read and parse the file only once.



The parsing function parses the file and returns an itertools.groupby object.



def parse_file():
...
return itertools.groupby(lines, key=keyfunc)


I thought about doing the following:



csv_file_content = read_csv_file()

file_content_1, file_content_2 = itertools.tee(csv_file_content, 2)

foo(file_content_1)
bar(file_content_2)


However, itertools.tee seems to only be able to "duplicate" the external iterator, while the internal (nested) iterator still refers to the original (hence it will be exhausted after iterating over the 1st iterator returned by itertools.tee).



Standalone MCVE:



from itertools import groupby, tee

li = [{'name': 'a', 'id': 1},
{'name': 'a', 'id': 2},
{'name': 'b', 'id': 3},
{'name': 'b', 'id': 4},
{'name': 'c', 'id': 5},
{'name': 'c', 'id': 6}]

groupby_obj = groupby(li, key=lambda x:x['name'])
tee_obj1, tee_obj2 = tee(groupby_obj, 2)

print(id(tee_obj1))
for group, data in tee_obj1:
print(group)
print(id(data))
for i in data:
print(i)

print('----')

print(id(tee_obj2))
for group, data in tee_obj2:
print(group)
print(id(data))
for i in data:
print(i)


Outputs



2380054450440
a
2380053623136
{'name': 'a', 'id': 1}
{'name': 'a', 'id': 2}
b
2380030915976
{'name': 'b', 'id': 3}
{'name': 'b', 'id': 4}
c
2380054184344
{'name': 'c', 'id': 5}
{'name': 'c', 'id': 6}
----
2380064387336
a
2380053623136 # same ID as above
b
2380030915976 # same ID as above
c
2380054184344 # same ID as above


How can we efficiently duplicate a nested iterator?










share|improve this question




















  • 1





    But if you tee the inner iterator, wouldn't you be reading the file twice?

    – Daniel Mesejo
    Jan 1 at 10:20











  • you'd probably be better off by hardcoding everything into lists.

    – Jean-François Fabre
    Jan 1 at 10:23






  • 1





    It seems grouped_object even in tee can not be used twice. This parallel doesn't work: tee_obj1, tee_obj2 = groupby_obj, groupby_obj. But I guess this gives the expected result: tee_obj1, tee_obj2 = copy.deepcopy(groupby_obj), groupby_obj. I guess..

    – iGian
    Jan 1 at 10:30













  • "how to recursively copy an iterator" cannot be properly answered (or there is no solution) as discussed here stackoverflow.com/questions/42132731/… but in your case it seems that deepcopy solves it.

    – Jean-François Fabre
    Jan 1 at 10:46


















5















I'm reading a file (while doing some expensive logic) that I will need to iterate several times in different functions, so I really want to read and parse the file only once.



The parsing function parses the file and returns an itertools.groupby object.



def parse_file():
...
return itertools.groupby(lines, key=keyfunc)


I thought about doing the following:



csv_file_content = read_csv_file()

file_content_1, file_content_2 = itertools.tee(csv_file_content, 2)

foo(file_content_1)
bar(file_content_2)


However, itertools.tee seems to only be able to "duplicate" the external iterator, while the internal (nested) iterator still refers to the original (hence it will be exhausted after iterating over the 1st iterator returned by itertools.tee).



Standalone MCVE:



from itertools import groupby, tee

li = [{'name': 'a', 'id': 1},
{'name': 'a', 'id': 2},
{'name': 'b', 'id': 3},
{'name': 'b', 'id': 4},
{'name': 'c', 'id': 5},
{'name': 'c', 'id': 6}]

groupby_obj = groupby(li, key=lambda x:x['name'])
tee_obj1, tee_obj2 = tee(groupby_obj, 2)

print(id(tee_obj1))
for group, data in tee_obj1:
print(group)
print(id(data))
for i in data:
print(i)

print('----')

print(id(tee_obj2))
for group, data in tee_obj2:
print(group)
print(id(data))
for i in data:
print(i)


Outputs



2380054450440
a
2380053623136
{'name': 'a', 'id': 1}
{'name': 'a', 'id': 2}
b
2380030915976
{'name': 'b', 'id': 3}
{'name': 'b', 'id': 4}
c
2380054184344
{'name': 'c', 'id': 5}
{'name': 'c', 'id': 6}
----
2380064387336
a
2380053623136 # same ID as above
b
2380030915976 # same ID as above
c
2380054184344 # same ID as above


How can we efficiently duplicate a nested iterator?










share|improve this question




















  • 1





    But if you tee the inner iterator, wouldn't you be reading the file twice?

    – Daniel Mesejo
    Jan 1 at 10:20











  • you'd probably be better off by hardcoding everything into lists.

    – Jean-François Fabre
    Jan 1 at 10:23






  • 1





    It seems grouped_object even in tee can not be used twice. This parallel doesn't work: tee_obj1, tee_obj2 = groupby_obj, groupby_obj. But I guess this gives the expected result: tee_obj1, tee_obj2 = copy.deepcopy(groupby_obj), groupby_obj. I guess..

    – iGian
    Jan 1 at 10:30













  • "how to recursively copy an iterator" cannot be properly answered (or there is no solution) as discussed here stackoverflow.com/questions/42132731/… but in your case it seems that deepcopy solves it.

    – Jean-François Fabre
    Jan 1 at 10:46
















5












5








5








I'm reading a file (while doing some expensive logic) that I will need to iterate several times in different functions, so I really want to read and parse the file only once.



The parsing function parses the file and returns an itertools.groupby object.



def parse_file():
...
return itertools.groupby(lines, key=keyfunc)


I thought about doing the following:



csv_file_content = read_csv_file()

file_content_1, file_content_2 = itertools.tee(csv_file_content, 2)

foo(file_content_1)
bar(file_content_2)


However, itertools.tee seems to only be able to "duplicate" the external iterator, while the internal (nested) iterator still refers to the original (hence it will be exhausted after iterating over the 1st iterator returned by itertools.tee).



Standalone MCVE:



from itertools import groupby, tee

li = [{'name': 'a', 'id': 1},
{'name': 'a', 'id': 2},
{'name': 'b', 'id': 3},
{'name': 'b', 'id': 4},
{'name': 'c', 'id': 5},
{'name': 'c', 'id': 6}]

groupby_obj = groupby(li, key=lambda x:x['name'])
tee_obj1, tee_obj2 = tee(groupby_obj, 2)

print(id(tee_obj1))
for group, data in tee_obj1:
print(group)
print(id(data))
for i in data:
print(i)

print('----')

print(id(tee_obj2))
for group, data in tee_obj2:
print(group)
print(id(data))
for i in data:
print(i)


Outputs



2380054450440
a
2380053623136
{'name': 'a', 'id': 1}
{'name': 'a', 'id': 2}
b
2380030915976
{'name': 'b', 'id': 3}
{'name': 'b', 'id': 4}
c
2380054184344
{'name': 'c', 'id': 5}
{'name': 'c', 'id': 6}
----
2380064387336
a
2380053623136 # same ID as above
b
2380030915976 # same ID as above
c
2380054184344 # same ID as above


How can we efficiently duplicate a nested iterator?










share|improve this question
















I'm reading a file (while doing some expensive logic) that I will need to iterate several times in different functions, so I really want to read and parse the file only once.



The parsing function parses the file and returns an itertools.groupby object.



def parse_file():
...
return itertools.groupby(lines, key=keyfunc)


I thought about doing the following:



csv_file_content = read_csv_file()

file_content_1, file_content_2 = itertools.tee(csv_file_content, 2)

foo(file_content_1)
bar(file_content_2)


However, itertools.tee seems to only be able to "duplicate" the external iterator, while the internal (nested) iterator still refers to the original (hence it will be exhausted after iterating over the 1st iterator returned by itertools.tee).



Standalone MCVE:



from itertools import groupby, tee

li = [{'name': 'a', 'id': 1},
{'name': 'a', 'id': 2},
{'name': 'b', 'id': 3},
{'name': 'b', 'id': 4},
{'name': 'c', 'id': 5},
{'name': 'c', 'id': 6}]

groupby_obj = groupby(li, key=lambda x:x['name'])
tee_obj1, tee_obj2 = tee(groupby_obj, 2)

print(id(tee_obj1))
for group, data in tee_obj1:
print(group)
print(id(data))
for i in data:
print(i)

print('----')

print(id(tee_obj2))
for group, data in tee_obj2:
print(group)
print(id(data))
for i in data:
print(i)


Outputs



2380054450440
a
2380053623136
{'name': 'a', 'id': 1}
{'name': 'a', 'id': 2}
b
2380030915976
{'name': 'b', 'id': 3}
{'name': 'b', 'id': 4}
c
2380054184344
{'name': 'c', 'id': 5}
{'name': 'c', 'id': 6}
----
2380064387336
a
2380053623136 # same ID as above
b
2380030915976 # same ID as above
c
2380054184344 # same ID as above


How can we efficiently duplicate a nested iterator?







python iterator itertools






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 1 at 10:07







DeepSpace

















asked Jan 1 at 9:59









DeepSpaceDeepSpace

39.3k44777




39.3k44777








  • 1





    But if you tee the inner iterator, wouldn't you be reading the file twice?

    – Daniel Mesejo
    Jan 1 at 10:20











  • you'd probably be better off by hardcoding everything into lists.

    – Jean-François Fabre
    Jan 1 at 10:23






  • 1





    It seems grouped_object even in tee can not be used twice. This parallel doesn't work: tee_obj1, tee_obj2 = groupby_obj, groupby_obj. But I guess this gives the expected result: tee_obj1, tee_obj2 = copy.deepcopy(groupby_obj), groupby_obj. I guess..

    – iGian
    Jan 1 at 10:30













  • "how to recursively copy an iterator" cannot be properly answered (or there is no solution) as discussed here stackoverflow.com/questions/42132731/… but in your case it seems that deepcopy solves it.

    – Jean-François Fabre
    Jan 1 at 10:46
















  • 1





    But if you tee the inner iterator, wouldn't you be reading the file twice?

    – Daniel Mesejo
    Jan 1 at 10:20











  • you'd probably be better off by hardcoding everything into lists.

    – Jean-François Fabre
    Jan 1 at 10:23






  • 1





    It seems grouped_object even in tee can not be used twice. This parallel doesn't work: tee_obj1, tee_obj2 = groupby_obj, groupby_obj. But I guess this gives the expected result: tee_obj1, tee_obj2 = copy.deepcopy(groupby_obj), groupby_obj. I guess..

    – iGian
    Jan 1 at 10:30













  • "how to recursively copy an iterator" cannot be properly answered (or there is no solution) as discussed here stackoverflow.com/questions/42132731/… but in your case it seems that deepcopy solves it.

    – Jean-François Fabre
    Jan 1 at 10:46










1




1





But if you tee the inner iterator, wouldn't you be reading the file twice?

– Daniel Mesejo
Jan 1 at 10:20





But if you tee the inner iterator, wouldn't you be reading the file twice?

– Daniel Mesejo
Jan 1 at 10:20













you'd probably be better off by hardcoding everything into lists.

– Jean-François Fabre
Jan 1 at 10:23





you'd probably be better off by hardcoding everything into lists.

– Jean-François Fabre
Jan 1 at 10:23




1




1





It seems grouped_object even in tee can not be used twice. This parallel doesn't work: tee_obj1, tee_obj2 = groupby_obj, groupby_obj. But I guess this gives the expected result: tee_obj1, tee_obj2 = copy.deepcopy(groupby_obj), groupby_obj. I guess..

– iGian
Jan 1 at 10:30







It seems grouped_object even in tee can not be used twice. This parallel doesn't work: tee_obj1, tee_obj2 = groupby_obj, groupby_obj. But I guess this gives the expected result: tee_obj1, tee_obj2 = copy.deepcopy(groupby_obj), groupby_obj. I guess..

– iGian
Jan 1 at 10:30















"how to recursively copy an iterator" cannot be properly answered (or there is no solution) as discussed here stackoverflow.com/questions/42132731/… but in your case it seems that deepcopy solves it.

– Jean-François Fabre
Jan 1 at 10:46







"how to recursively copy an iterator" cannot be properly answered (or there is no solution) as discussed here stackoverflow.com/questions/42132731/… but in your case it seems that deepcopy solves it.

– Jean-François Fabre
Jan 1 at 10:46














1 Answer
1






active

oldest

votes


















2














It seems like grouped_object (class 'itertools.groupby') be consumed once, even in itertools.tee.
Also parallel assignement of the same grouped_object doesn't work:



tee_obj1, tee_obj2 = groupby_obj, groupby_obj


What's working is a deep copy of the grouped_object:



tee_obj1, tee_obj2 = copy.deepcopy(groupby_obj), groupby_obj





share|improve this answer





















  • 2





    "It seems like grouped_objectct (class 'itertools.groupby') be consumed once, even in itertools.tee" I don't think this is true, otherwise a b c would not have been outputted the second time. I'll accept this answer though I was hoping to use something more elegant than deepcopy. Thanks!

    – DeepSpace
    Jan 1 at 11:12











  • what is true is that the grouper objects returned as "values" of each groupby iteration cannot be tee'd.

    – Jean-François Fabre
    Jan 1 at 15:20











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53994545%2fusing-itertools-tee-to-duplicate-a-nested-iterator-ie-itertools-groupby%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









2














It seems like grouped_object (class 'itertools.groupby') be consumed once, even in itertools.tee.
Also parallel assignement of the same grouped_object doesn't work:



tee_obj1, tee_obj2 = groupby_obj, groupby_obj


What's working is a deep copy of the grouped_object:



tee_obj1, tee_obj2 = copy.deepcopy(groupby_obj), groupby_obj





share|improve this answer





















  • 2





    "It seems like grouped_objectct (class 'itertools.groupby') be consumed once, even in itertools.tee" I don't think this is true, otherwise a b c would not have been outputted the second time. I'll accept this answer though I was hoping to use something more elegant than deepcopy. Thanks!

    – DeepSpace
    Jan 1 at 11:12











  • what is true is that the grouper objects returned as "values" of each groupby iteration cannot be tee'd.

    – Jean-François Fabre
    Jan 1 at 15:20
















2














It seems like grouped_object (class 'itertools.groupby') be consumed once, even in itertools.tee.
Also parallel assignement of the same grouped_object doesn't work:



tee_obj1, tee_obj2 = groupby_obj, groupby_obj


What's working is a deep copy of the grouped_object:



tee_obj1, tee_obj2 = copy.deepcopy(groupby_obj), groupby_obj





share|improve this answer





















  • 2





    "It seems like grouped_objectct (class 'itertools.groupby') be consumed once, even in itertools.tee" I don't think this is true, otherwise a b c would not have been outputted the second time. I'll accept this answer though I was hoping to use something more elegant than deepcopy. Thanks!

    – DeepSpace
    Jan 1 at 11:12











  • what is true is that the grouper objects returned as "values" of each groupby iteration cannot be tee'd.

    – Jean-François Fabre
    Jan 1 at 15:20














2












2








2







It seems like grouped_object (class 'itertools.groupby') be consumed once, even in itertools.tee.
Also parallel assignement of the same grouped_object doesn't work:



tee_obj1, tee_obj2 = groupby_obj, groupby_obj


What's working is a deep copy of the grouped_object:



tee_obj1, tee_obj2 = copy.deepcopy(groupby_obj), groupby_obj





share|improve this answer















It seems like grouped_object (class 'itertools.groupby') be consumed once, even in itertools.tee.
Also parallel assignement of the same grouped_object doesn't work:



tee_obj1, tee_obj2 = groupby_obj, groupby_obj


What's working is a deep copy of the grouped_object:



tee_obj1, tee_obj2 = copy.deepcopy(groupby_obj), groupby_obj






share|improve this answer














share|improve this answer



share|improve this answer








edited Jan 1 at 11:15









DeepSpace

39.3k44777




39.3k44777










answered Jan 1 at 10:54









iGianiGian

4,4842625




4,4842625








  • 2





    "It seems like grouped_objectct (class 'itertools.groupby') be consumed once, even in itertools.tee" I don't think this is true, otherwise a b c would not have been outputted the second time. I'll accept this answer though I was hoping to use something more elegant than deepcopy. Thanks!

    – DeepSpace
    Jan 1 at 11:12











  • what is true is that the grouper objects returned as "values" of each groupby iteration cannot be tee'd.

    – Jean-François Fabre
    Jan 1 at 15:20














  • 2





    "It seems like grouped_objectct (class 'itertools.groupby') be consumed once, even in itertools.tee" I don't think this is true, otherwise a b c would not have been outputted the second time. I'll accept this answer though I was hoping to use something more elegant than deepcopy. Thanks!

    – DeepSpace
    Jan 1 at 11:12











  • what is true is that the grouper objects returned as "values" of each groupby iteration cannot be tee'd.

    – Jean-François Fabre
    Jan 1 at 15:20








2




2





"It seems like grouped_objectct (class 'itertools.groupby') be consumed once, even in itertools.tee" I don't think this is true, otherwise a b c would not have been outputted the second time. I'll accept this answer though I was hoping to use something more elegant than deepcopy. Thanks!

– DeepSpace
Jan 1 at 11:12





"It seems like grouped_objectct (class 'itertools.groupby') be consumed once, even in itertools.tee" I don't think this is true, otherwise a b c would not have been outputted the second time. I'll accept this answer though I was hoping to use something more elegant than deepcopy. Thanks!

– DeepSpace
Jan 1 at 11:12













what is true is that the grouper objects returned as "values" of each groupby iteration cannot be tee'd.

– Jean-François Fabre
Jan 1 at 15:20





what is true is that the grouper objects returned as "values" of each groupby iteration cannot be tee'd.

– Jean-François Fabre
Jan 1 at 15:20




















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53994545%2fusing-itertools-tee-to-duplicate-a-nested-iterator-ie-itertools-groupby%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

android studio warns about leanback feature tag usage required on manifest while using Unity exported app?

SQL update select statement

'app-layout' is not a known element: how to share Component with different Modules