keras fit_generator when you don't know how many steps you'll have
B"H
Is there any way to implement a data generator (either by creating a python generator or by subclassing Sequence) for training data if I don't know in advance how many records I will have in an epoch?
Often times, even if you can't load the entire training set into memory, you still know how may items you will have - for images you can get a file count, or something similar. But sometimes you don't know in advance and it is too expensive to get an exact count - if you have multiple variable length files each with many records, or if you are being fed the training data by an outside source. You may actually have more data at each epoch, by the time the first epoch is done, there may have been more data collected.
So you don't know what number to pass to steps_per_epoch
or have a real way to implement __len__
Currently I am actually working on a cnn but this would be more comon by an lstm
python keras
add a comment |
B"H
Is there any way to implement a data generator (either by creating a python generator or by subclassing Sequence) for training data if I don't know in advance how many records I will have in an epoch?
Often times, even if you can't load the entire training set into memory, you still know how may items you will have - for images you can get a file count, or something similar. But sometimes you don't know in advance and it is too expensive to get an exact count - if you have multiple variable length files each with many records, or if you are being fed the training data by an outside source. You may actually have more data at each epoch, by the time the first epoch is done, there may have been more data collected.
So you don't know what number to pass to steps_per_epoch
or have a real way to implement __len__
Currently I am actually working on a cnn but this would be more comon by an lstm
python keras
1
You need to define what is an epoch in this case, you either have to measure it (once), or define it in another way (if the data changes for example).
– Matias Valdenegro
Jan 2 at 18:24
Thank you for a fast response. At the end when I run out of data (or once I've opened the last file) I do know that the epoch is over. But at that point the__len__
function has already been read' and i can't just output an EOF or something
– Rabbi
Jan 2 at 19:03
1
What is the issue with making a script that just computes how many data points there is, you need to compute this only once, and then you can hardcode steps per epoch as samples / batch_size.
– Matias Valdenegro
Jan 2 at 20:19
1
running through (downloading, or loading) each file to count how many samples are in each is too (computationally and financially) expensive. I need to be able to load each resource only once per epoch. So I don't know how many samples there will be in the current epoch until I reach the last resource.
– Rabbi
Jan 3 at 23:06
add a comment |
B"H
Is there any way to implement a data generator (either by creating a python generator or by subclassing Sequence) for training data if I don't know in advance how many records I will have in an epoch?
Often times, even if you can't load the entire training set into memory, you still know how may items you will have - for images you can get a file count, or something similar. But sometimes you don't know in advance and it is too expensive to get an exact count - if you have multiple variable length files each with many records, or if you are being fed the training data by an outside source. You may actually have more data at each epoch, by the time the first epoch is done, there may have been more data collected.
So you don't know what number to pass to steps_per_epoch
or have a real way to implement __len__
Currently I am actually working on a cnn but this would be more comon by an lstm
python keras
B"H
Is there any way to implement a data generator (either by creating a python generator or by subclassing Sequence) for training data if I don't know in advance how many records I will have in an epoch?
Often times, even if you can't load the entire training set into memory, you still know how may items you will have - for images you can get a file count, or something similar. But sometimes you don't know in advance and it is too expensive to get an exact count - if you have multiple variable length files each with many records, or if you are being fed the training data by an outside source. You may actually have more data at each epoch, by the time the first epoch is done, there may have been more data collected.
So you don't know what number to pass to steps_per_epoch
or have a real way to implement __len__
Currently I am actually working on a cnn but this would be more comon by an lstm
python keras
python keras
asked Jan 2 at 17:53
RabbiRabbi
2,15172734
2,15172734
1
You need to define what is an epoch in this case, you either have to measure it (once), or define it in another way (if the data changes for example).
– Matias Valdenegro
Jan 2 at 18:24
Thank you for a fast response. At the end when I run out of data (or once I've opened the last file) I do know that the epoch is over. But at that point the__len__
function has already been read' and i can't just output an EOF or something
– Rabbi
Jan 2 at 19:03
1
What is the issue with making a script that just computes how many data points there is, you need to compute this only once, and then you can hardcode steps per epoch as samples / batch_size.
– Matias Valdenegro
Jan 2 at 20:19
1
running through (downloading, or loading) each file to count how many samples are in each is too (computationally and financially) expensive. I need to be able to load each resource only once per epoch. So I don't know how many samples there will be in the current epoch until I reach the last resource.
– Rabbi
Jan 3 at 23:06
add a comment |
1
You need to define what is an epoch in this case, you either have to measure it (once), or define it in another way (if the data changes for example).
– Matias Valdenegro
Jan 2 at 18:24
Thank you for a fast response. At the end when I run out of data (or once I've opened the last file) I do know that the epoch is over. But at that point the__len__
function has already been read' and i can't just output an EOF or something
– Rabbi
Jan 2 at 19:03
1
What is the issue with making a script that just computes how many data points there is, you need to compute this only once, and then you can hardcode steps per epoch as samples / batch_size.
– Matias Valdenegro
Jan 2 at 20:19
1
running through (downloading, or loading) each file to count how many samples are in each is too (computationally and financially) expensive. I need to be able to load each resource only once per epoch. So I don't know how many samples there will be in the current epoch until I reach the last resource.
– Rabbi
Jan 3 at 23:06
1
1
You need to define what is an epoch in this case, you either have to measure it (once), or define it in another way (if the data changes for example).
– Matias Valdenegro
Jan 2 at 18:24
You need to define what is an epoch in this case, you either have to measure it (once), or define it in another way (if the data changes for example).
– Matias Valdenegro
Jan 2 at 18:24
Thank you for a fast response. At the end when I run out of data (or once I've opened the last file) I do know that the epoch is over. But at that point the
__len__
function has already been read' and i can't just output an EOF or something– Rabbi
Jan 2 at 19:03
Thank you for a fast response. At the end when I run out of data (or once I've opened the last file) I do know that the epoch is over. But at that point the
__len__
function has already been read' and i can't just output an EOF or something– Rabbi
Jan 2 at 19:03
1
1
What is the issue with making a script that just computes how many data points there is, you need to compute this only once, and then you can hardcode steps per epoch as samples / batch_size.
– Matias Valdenegro
Jan 2 at 20:19
What is the issue with making a script that just computes how many data points there is, you need to compute this only once, and then you can hardcode steps per epoch as samples / batch_size.
– Matias Valdenegro
Jan 2 at 20:19
1
1
running through (downloading, or loading) each file to count how many samples are in each is too (computationally and financially) expensive. I need to be able to load each resource only once per epoch. So I don't know how many samples there will be in the current epoch until I reach the last resource.
– Rabbi
Jan 3 at 23:06
running through (downloading, or loading) each file to count how many samples are in each is too (computationally and financially) expensive. I need to be able to load each resource only once per epoch. So I don't know how many samples there will be in the current epoch until I reach the last resource.
– Rabbi
Jan 3 at 23:06
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54010951%2fkeras-fit-generator-when-you-dont-know-how-many-steps-youll-have%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54010951%2fkeras-fit-generator-when-you-dont-know-how-many-steps-youll-have%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
You need to define what is an epoch in this case, you either have to measure it (once), or define it in another way (if the data changes for example).
– Matias Valdenegro
Jan 2 at 18:24
Thank you for a fast response. At the end when I run out of data (or once I've opened the last file) I do know that the epoch is over. But at that point the
__len__
function has already been read' and i can't just output an EOF or something– Rabbi
Jan 2 at 19:03
1
What is the issue with making a script that just computes how many data points there is, you need to compute this only once, and then you can hardcode steps per epoch as samples / batch_size.
– Matias Valdenegro
Jan 2 at 20:19
1
running through (downloading, or loading) each file to count how many samples are in each is too (computationally and financially) expensive. I need to be able to load each resource only once per epoch. So I don't know how many samples there will be in the current epoch until I reach the last resource.
– Rabbi
Jan 3 at 23:06