neural network keeps reproducing the baseline classifier
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}
I'm trying to train a network for a binary classification problem. (It's a convnet, using keras in R, for image recognition, the Human Protein Image Classification Challenge at kaggle to be specific, but I don't think the details are hugely important here. And this has happened with a completely different problem I've worked on before, for a multi-classification problem with text data, and with different software (Spark), so I'll keep this question very general.)
My training examples are labeled '0' and '1'. There are more 0's than 1's in the training data. The networks I train (while using binary crossentropy as my loss function) keep reproducing the baseline classifier; that is, the classifier that predicts '0' all the time, regardless of the test input.
It's not at all mystifying to me why this should sometimes happen. First, there are lots and lots of network configurations that reproduce this classifier; for instance, it wouldn't be hard at all to engineer a network that just output '0' all the time regardless of the input. And secondly, any such configuration is no doubt a local minimum for the loss function on the loss-landscape, and finding local minimums for the loss function is exactly what we ask these neural networks to do. So, we can't hardly blame them for sometimes coming up with this "somewhat good" configuration after training. But, this problem has been particularly persistent for me.
MY QUESTION: is this "regression to the baseline" a common problem in deep learning, and what are some "best practice" ways to either avoid it or combat it?
Just to motivate discussion, I'll mention a few possible courses of action that have already occurred to me, some of which I've actually tried (with no success):
1) Increasing the network complexity (adding more layers, more neurons per layer, more filters in the case of convnets, etc). This is the obvious first move; maybe the network just isn't "smart" enough, even given the best training, to differentiate between '0' and '1', and so the baseline really is the best that you can hope this network architecure to accomplish.
This I've tried. I've even tried a pre-trained convnet with two densely connected layers and 41 million trainable parameters. Same result.
2) Changing the loss function. I tried this, and it didn't help. Noteably, when I train with loss = binary_crossentropy (when your metric is accuracy), it produces the baseline classifier for that metric (predicting all '0's). And when I train with loss = F1_score, it produces the baseline classifier for that metric (predicting all '1's). So again, this thing is obviously doing what it is supposed to, finding a good local minimum; it's just a horrible solution (obviously).
3) Just train the whole thing over again (with a different random initial configuration). I tried this, and it didn't help; it keeps reproducing the baseline. So the baseline is not just popping up because of bad luck, it seems to be ubiquitous.
4) Adjust the learning rate. Tried this, no luck. And really, there's no reason to expect this to help; if it found the baseline before, slowing the learning rate probably won't help to "unfind" it.
Anyone else run into this problem? And how did you deal with it?
keras deep-learning classification
add a comment |
I'm trying to train a network for a binary classification problem. (It's a convnet, using keras in R, for image recognition, the Human Protein Image Classification Challenge at kaggle to be specific, but I don't think the details are hugely important here. And this has happened with a completely different problem I've worked on before, for a multi-classification problem with text data, and with different software (Spark), so I'll keep this question very general.)
My training examples are labeled '0' and '1'. There are more 0's than 1's in the training data. The networks I train (while using binary crossentropy as my loss function) keep reproducing the baseline classifier; that is, the classifier that predicts '0' all the time, regardless of the test input.
It's not at all mystifying to me why this should sometimes happen. First, there are lots and lots of network configurations that reproduce this classifier; for instance, it wouldn't be hard at all to engineer a network that just output '0' all the time regardless of the input. And secondly, any such configuration is no doubt a local minimum for the loss function on the loss-landscape, and finding local minimums for the loss function is exactly what we ask these neural networks to do. So, we can't hardly blame them for sometimes coming up with this "somewhat good" configuration after training. But, this problem has been particularly persistent for me.
MY QUESTION: is this "regression to the baseline" a common problem in deep learning, and what are some "best practice" ways to either avoid it or combat it?
Just to motivate discussion, I'll mention a few possible courses of action that have already occurred to me, some of which I've actually tried (with no success):
1) Increasing the network complexity (adding more layers, more neurons per layer, more filters in the case of convnets, etc). This is the obvious first move; maybe the network just isn't "smart" enough, even given the best training, to differentiate between '0' and '1', and so the baseline really is the best that you can hope this network architecure to accomplish.
This I've tried. I've even tried a pre-trained convnet with two densely connected layers and 41 million trainable parameters. Same result.
2) Changing the loss function. I tried this, and it didn't help. Noteably, when I train with loss = binary_crossentropy (when your metric is accuracy), it produces the baseline classifier for that metric (predicting all '0's). And when I train with loss = F1_score, it produces the baseline classifier for that metric (predicting all '1's). So again, this thing is obviously doing what it is supposed to, finding a good local minimum; it's just a horrible solution (obviously).
3) Just train the whole thing over again (with a different random initial configuration). I tried this, and it didn't help; it keeps reproducing the baseline. So the baseline is not just popping up because of bad luck, it seems to be ubiquitous.
4) Adjust the learning rate. Tried this, no luck. And really, there's no reason to expect this to help; if it found the baseline before, slowing the learning rate probably won't help to "unfind" it.
Anyone else run into this problem? And how did you deal with it?
keras deep-learning classification
How imbalanced are your classes?
– jonnor
Jan 3 at 10:28
Not very, about a 40/60 split.
– Mike Crumley
Jan 3 at 14:14
add a comment |
I'm trying to train a network for a binary classification problem. (It's a convnet, using keras in R, for image recognition, the Human Protein Image Classification Challenge at kaggle to be specific, but I don't think the details are hugely important here. And this has happened with a completely different problem I've worked on before, for a multi-classification problem with text data, and with different software (Spark), so I'll keep this question very general.)
My training examples are labeled '0' and '1'. There are more 0's than 1's in the training data. The networks I train (while using binary crossentropy as my loss function) keep reproducing the baseline classifier; that is, the classifier that predicts '0' all the time, regardless of the test input.
It's not at all mystifying to me why this should sometimes happen. First, there are lots and lots of network configurations that reproduce this classifier; for instance, it wouldn't be hard at all to engineer a network that just output '0' all the time regardless of the input. And secondly, any such configuration is no doubt a local minimum for the loss function on the loss-landscape, and finding local minimums for the loss function is exactly what we ask these neural networks to do. So, we can't hardly blame them for sometimes coming up with this "somewhat good" configuration after training. But, this problem has been particularly persistent for me.
MY QUESTION: is this "regression to the baseline" a common problem in deep learning, and what are some "best practice" ways to either avoid it or combat it?
Just to motivate discussion, I'll mention a few possible courses of action that have already occurred to me, some of which I've actually tried (with no success):
1) Increasing the network complexity (adding more layers, more neurons per layer, more filters in the case of convnets, etc). This is the obvious first move; maybe the network just isn't "smart" enough, even given the best training, to differentiate between '0' and '1', and so the baseline really is the best that you can hope this network architecure to accomplish.
This I've tried. I've even tried a pre-trained convnet with two densely connected layers and 41 million trainable parameters. Same result.
2) Changing the loss function. I tried this, and it didn't help. Noteably, when I train with loss = binary_crossentropy (when your metric is accuracy), it produces the baseline classifier for that metric (predicting all '0's). And when I train with loss = F1_score, it produces the baseline classifier for that metric (predicting all '1's). So again, this thing is obviously doing what it is supposed to, finding a good local minimum; it's just a horrible solution (obviously).
3) Just train the whole thing over again (with a different random initial configuration). I tried this, and it didn't help; it keeps reproducing the baseline. So the baseline is not just popping up because of bad luck, it seems to be ubiquitous.
4) Adjust the learning rate. Tried this, no luck. And really, there's no reason to expect this to help; if it found the baseline before, slowing the learning rate probably won't help to "unfind" it.
Anyone else run into this problem? And how did you deal with it?
keras deep-learning classification
I'm trying to train a network for a binary classification problem. (It's a convnet, using keras in R, for image recognition, the Human Protein Image Classification Challenge at kaggle to be specific, but I don't think the details are hugely important here. And this has happened with a completely different problem I've worked on before, for a multi-classification problem with text data, and with different software (Spark), so I'll keep this question very general.)
My training examples are labeled '0' and '1'. There are more 0's than 1's in the training data. The networks I train (while using binary crossentropy as my loss function) keep reproducing the baseline classifier; that is, the classifier that predicts '0' all the time, regardless of the test input.
It's not at all mystifying to me why this should sometimes happen. First, there are lots and lots of network configurations that reproduce this classifier; for instance, it wouldn't be hard at all to engineer a network that just output '0' all the time regardless of the input. And secondly, any such configuration is no doubt a local minimum for the loss function on the loss-landscape, and finding local minimums for the loss function is exactly what we ask these neural networks to do. So, we can't hardly blame them for sometimes coming up with this "somewhat good" configuration after training. But, this problem has been particularly persistent for me.
MY QUESTION: is this "regression to the baseline" a common problem in deep learning, and what are some "best practice" ways to either avoid it or combat it?
Just to motivate discussion, I'll mention a few possible courses of action that have already occurred to me, some of which I've actually tried (with no success):
1) Increasing the network complexity (adding more layers, more neurons per layer, more filters in the case of convnets, etc). This is the obvious first move; maybe the network just isn't "smart" enough, even given the best training, to differentiate between '0' and '1', and so the baseline really is the best that you can hope this network architecure to accomplish.
This I've tried. I've even tried a pre-trained convnet with two densely connected layers and 41 million trainable parameters. Same result.
2) Changing the loss function. I tried this, and it didn't help. Noteably, when I train with loss = binary_crossentropy (when your metric is accuracy), it produces the baseline classifier for that metric (predicting all '0's). And when I train with loss = F1_score, it produces the baseline classifier for that metric (predicting all '1's). So again, this thing is obviously doing what it is supposed to, finding a good local minimum; it's just a horrible solution (obviously).
3) Just train the whole thing over again (with a different random initial configuration). I tried this, and it didn't help; it keeps reproducing the baseline. So the baseline is not just popping up because of bad luck, it seems to be ubiquitous.
4) Adjust the learning rate. Tried this, no luck. And really, there's no reason to expect this to help; if it found the baseline before, slowing the learning rate probably won't help to "unfind" it.
Anyone else run into this problem? And how did you deal with it?
keras deep-learning classification
keras deep-learning classification
asked Jan 3 at 3:59
Mike CrumleyMike Crumley
1085
1085
How imbalanced are your classes?
– jonnor
Jan 3 at 10:28
Not very, about a 40/60 split.
– Mike Crumley
Jan 3 at 14:14
add a comment |
How imbalanced are your classes?
– jonnor
Jan 3 at 10:28
Not very, about a 40/60 split.
– Mike Crumley
Jan 3 at 14:14
How imbalanced are your classes?
– jonnor
Jan 3 at 10:28
How imbalanced are your classes?
– jonnor
Jan 3 at 10:28
Not very, about a 40/60 split.
– Mike Crumley
Jan 3 at 14:14
Not very, about a 40/60 split.
– Mike Crumley
Jan 3 at 14:14
add a comment |
1 Answer
1
active
oldest
votes
I'm not sure what is the best way, but I can share some of my experiences.
First, it's better to make it same number of '0'-labeled samples and '1'-labeled samples
Second, if it goes to the baseline every-time, your sample is like too random.
So you must break the problem and make it less random.
I'd rather not artificially balance the classes unnecessarily, the test (real world) data is unbalanced. And, I know the problem isn't with the data itself, other people are coming up with perfectly respectable classifiers for this problem. And, I have tried augmenting the images with random transformations, no luck there either.
– Mike Crumley
Jan 3 at 14:17
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54016172%2fneural-network-keeps-reproducing-the-baseline-classifier%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
I'm not sure what is the best way, but I can share some of my experiences.
First, it's better to make it same number of '0'-labeled samples and '1'-labeled samples
Second, if it goes to the baseline every-time, your sample is like too random.
So you must break the problem and make it less random.
I'd rather not artificially balance the classes unnecessarily, the test (real world) data is unbalanced. And, I know the problem isn't with the data itself, other people are coming up with perfectly respectable classifiers for this problem. And, I have tried augmenting the images with random transformations, no luck there either.
– Mike Crumley
Jan 3 at 14:17
add a comment |
I'm not sure what is the best way, but I can share some of my experiences.
First, it's better to make it same number of '0'-labeled samples and '1'-labeled samples
Second, if it goes to the baseline every-time, your sample is like too random.
So you must break the problem and make it less random.
I'd rather not artificially balance the classes unnecessarily, the test (real world) data is unbalanced. And, I know the problem isn't with the data itself, other people are coming up with perfectly respectable classifiers for this problem. And, I have tried augmenting the images with random transformations, no luck there either.
– Mike Crumley
Jan 3 at 14:17
add a comment |
I'm not sure what is the best way, but I can share some of my experiences.
First, it's better to make it same number of '0'-labeled samples and '1'-labeled samples
Second, if it goes to the baseline every-time, your sample is like too random.
So you must break the problem and make it less random.
I'm not sure what is the best way, but I can share some of my experiences.
First, it's better to make it same number of '0'-labeled samples and '1'-labeled samples
Second, if it goes to the baseline every-time, your sample is like too random.
So you must break the problem and make it less random.
answered Jan 3 at 6:35


CodingLabCodingLab
6072819
6072819
I'd rather not artificially balance the classes unnecessarily, the test (real world) data is unbalanced. And, I know the problem isn't with the data itself, other people are coming up with perfectly respectable classifiers for this problem. And, I have tried augmenting the images with random transformations, no luck there either.
– Mike Crumley
Jan 3 at 14:17
add a comment |
I'd rather not artificially balance the classes unnecessarily, the test (real world) data is unbalanced. And, I know the problem isn't with the data itself, other people are coming up with perfectly respectable classifiers for this problem. And, I have tried augmenting the images with random transformations, no luck there either.
– Mike Crumley
Jan 3 at 14:17
I'd rather not artificially balance the classes unnecessarily, the test (real world) data is unbalanced. And, I know the problem isn't with the data itself, other people are coming up with perfectly respectable classifiers for this problem. And, I have tried augmenting the images with random transformations, no luck there either.
– Mike Crumley
Jan 3 at 14:17
I'd rather not artificially balance the classes unnecessarily, the test (real world) data is unbalanced. And, I know the problem isn't with the data itself, other people are coming up with perfectly respectable classifiers for this problem. And, I have tried augmenting the images with random transformations, no luck there either.
– Mike Crumley
Jan 3 at 14:17
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54016172%2fneural-network-keeps-reproducing-the-baseline-classifier%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
How imbalanced are your classes?
– jonnor
Jan 3 at 10:28
Not very, about a 40/60 split.
– Mike Crumley
Jan 3 at 14:14