keras fit_generator when you don't know how many steps you'll have












2















B"H



Is there any way to implement a data generator (either by creating a python generator or by subclassing Sequence) for training data if I don't know in advance how many records I will have in an epoch?



Often times, even if you can't load the entire training set into memory, you still know how may items you will have - for images you can get a file count, or something similar. But sometimes you don't know in advance and it is too expensive to get an exact count - if you have multiple variable length files each with many records, or if you are being fed the training data by an outside source. You may actually have more data at each epoch, by the time the first epoch is done, there may have been more data collected.



So you don't know what number to pass to steps_per_epoch or have a real way to implement __len__



Currently I am actually working on a cnn but this would be more comon by an lstm










share|improve this question


















  • 1





    You need to define what is an epoch in this case, you either have to measure it (once), or define it in another way (if the data changes for example).

    – Matias Valdenegro
    Jan 2 at 18:24











  • Thank you for a fast response. At the end when I run out of data (or once I've opened the last file) I do know that the epoch is over. But at that point the __len__ function has already been read' and i can't just output an EOF or something

    – Rabbi
    Jan 2 at 19:03








  • 1





    What is the issue with making a script that just computes how many data points there is, you need to compute this only once, and then you can hardcode steps per epoch as samples / batch_size.

    – Matias Valdenegro
    Jan 2 at 20:19






  • 1





    running through (downloading, or loading) each file to count how many samples are in each is too (computationally and financially) expensive. I need to be able to load each resource only once per epoch. So I don't know how many samples there will be in the current epoch until I reach the last resource.

    – Rabbi
    Jan 3 at 23:06
















2















B"H



Is there any way to implement a data generator (either by creating a python generator or by subclassing Sequence) for training data if I don't know in advance how many records I will have in an epoch?



Often times, even if you can't load the entire training set into memory, you still know how may items you will have - for images you can get a file count, or something similar. But sometimes you don't know in advance and it is too expensive to get an exact count - if you have multiple variable length files each with many records, or if you are being fed the training data by an outside source. You may actually have more data at each epoch, by the time the first epoch is done, there may have been more data collected.



So you don't know what number to pass to steps_per_epoch or have a real way to implement __len__



Currently I am actually working on a cnn but this would be more comon by an lstm










share|improve this question


















  • 1





    You need to define what is an epoch in this case, you either have to measure it (once), or define it in another way (if the data changes for example).

    – Matias Valdenegro
    Jan 2 at 18:24











  • Thank you for a fast response. At the end when I run out of data (or once I've opened the last file) I do know that the epoch is over. But at that point the __len__ function has already been read' and i can't just output an EOF or something

    – Rabbi
    Jan 2 at 19:03








  • 1





    What is the issue with making a script that just computes how many data points there is, you need to compute this only once, and then you can hardcode steps per epoch as samples / batch_size.

    – Matias Valdenegro
    Jan 2 at 20:19






  • 1





    running through (downloading, or loading) each file to count how many samples are in each is too (computationally and financially) expensive. I need to be able to load each resource only once per epoch. So I don't know how many samples there will be in the current epoch until I reach the last resource.

    – Rabbi
    Jan 3 at 23:06














2












2








2








B"H



Is there any way to implement a data generator (either by creating a python generator or by subclassing Sequence) for training data if I don't know in advance how many records I will have in an epoch?



Often times, even if you can't load the entire training set into memory, you still know how may items you will have - for images you can get a file count, or something similar. But sometimes you don't know in advance and it is too expensive to get an exact count - if you have multiple variable length files each with many records, or if you are being fed the training data by an outside source. You may actually have more data at each epoch, by the time the first epoch is done, there may have been more data collected.



So you don't know what number to pass to steps_per_epoch or have a real way to implement __len__



Currently I am actually working on a cnn but this would be more comon by an lstm










share|improve this question














B"H



Is there any way to implement a data generator (either by creating a python generator or by subclassing Sequence) for training data if I don't know in advance how many records I will have in an epoch?



Often times, even if you can't load the entire training set into memory, you still know how may items you will have - for images you can get a file count, or something similar. But sometimes you don't know in advance and it is too expensive to get an exact count - if you have multiple variable length files each with many records, or if you are being fed the training data by an outside source. You may actually have more data at each epoch, by the time the first epoch is done, there may have been more data collected.



So you don't know what number to pass to steps_per_epoch or have a real way to implement __len__



Currently I am actually working on a cnn but this would be more comon by an lstm







python keras






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Jan 2 at 17:53









RabbiRabbi

2,15172734




2,15172734








  • 1





    You need to define what is an epoch in this case, you either have to measure it (once), or define it in another way (if the data changes for example).

    – Matias Valdenegro
    Jan 2 at 18:24











  • Thank you for a fast response. At the end when I run out of data (or once I've opened the last file) I do know that the epoch is over. But at that point the __len__ function has already been read' and i can't just output an EOF or something

    – Rabbi
    Jan 2 at 19:03








  • 1





    What is the issue with making a script that just computes how many data points there is, you need to compute this only once, and then you can hardcode steps per epoch as samples / batch_size.

    – Matias Valdenegro
    Jan 2 at 20:19






  • 1





    running through (downloading, or loading) each file to count how many samples are in each is too (computationally and financially) expensive. I need to be able to load each resource only once per epoch. So I don't know how many samples there will be in the current epoch until I reach the last resource.

    – Rabbi
    Jan 3 at 23:06














  • 1





    You need to define what is an epoch in this case, you either have to measure it (once), or define it in another way (if the data changes for example).

    – Matias Valdenegro
    Jan 2 at 18:24











  • Thank you for a fast response. At the end when I run out of data (or once I've opened the last file) I do know that the epoch is over. But at that point the __len__ function has already been read' and i can't just output an EOF or something

    – Rabbi
    Jan 2 at 19:03








  • 1





    What is the issue with making a script that just computes how many data points there is, you need to compute this only once, and then you can hardcode steps per epoch as samples / batch_size.

    – Matias Valdenegro
    Jan 2 at 20:19






  • 1





    running through (downloading, or loading) each file to count how many samples are in each is too (computationally and financially) expensive. I need to be able to load each resource only once per epoch. So I don't know how many samples there will be in the current epoch until I reach the last resource.

    – Rabbi
    Jan 3 at 23:06








1




1





You need to define what is an epoch in this case, you either have to measure it (once), or define it in another way (if the data changes for example).

– Matias Valdenegro
Jan 2 at 18:24





You need to define what is an epoch in this case, you either have to measure it (once), or define it in another way (if the data changes for example).

– Matias Valdenegro
Jan 2 at 18:24













Thank you for a fast response. At the end when I run out of data (or once I've opened the last file) I do know that the epoch is over. But at that point the __len__ function has already been read' and i can't just output an EOF or something

– Rabbi
Jan 2 at 19:03







Thank you for a fast response. At the end when I run out of data (or once I've opened the last file) I do know that the epoch is over. But at that point the __len__ function has already been read' and i can't just output an EOF or something

– Rabbi
Jan 2 at 19:03






1




1





What is the issue with making a script that just computes how many data points there is, you need to compute this only once, and then you can hardcode steps per epoch as samples / batch_size.

– Matias Valdenegro
Jan 2 at 20:19





What is the issue with making a script that just computes how many data points there is, you need to compute this only once, and then you can hardcode steps per epoch as samples / batch_size.

– Matias Valdenegro
Jan 2 at 20:19




1




1





running through (downloading, or loading) each file to count how many samples are in each is too (computationally and financially) expensive. I need to be able to load each resource only once per epoch. So I don't know how many samples there will be in the current epoch until I reach the last resource.

– Rabbi
Jan 3 at 23:06





running through (downloading, or loading) each file to count how many samples are in each is too (computationally and financially) expensive. I need to be able to load each resource only once per epoch. So I don't know how many samples there will be in the current epoch until I reach the last resource.

– Rabbi
Jan 3 at 23:06












0






active

oldest

votes












Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54010951%2fkeras-fit-generator-when-you-dont-know-how-many-steps-youll-have%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes
















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54010951%2fkeras-fit-generator-when-you-dont-know-how-many-steps-youll-have%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

MongoDB - Not Authorized To Execute Command

How to fix TextFormField cause rebuild widget in Flutter

in spring boot 2.1 many test slices are not allowed anymore due to multiple @BootstrapWith