MNIST Classification: mean_squared_error loss function and tanh activation function











up vote
0
down vote

favorite












I changed the getting started example of Tensorflow as following:



import tensorflow as tf
from sklearn.metrics import roc_auc_score
import numpy as np
import commons as cm
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sn

mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(512, activation=tf.nn.tanh),
# tf.keras.layers.Dense(512, activation=tf.nn.tanh),
# tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation=tf.nn.tanh)
])
model.compile(optimizer='adam',
loss='mean_squared_error',
# loss = 'sparse_categorical_crossentropy',
metrics=['accuracy'])

history = cm.Histories()
h= model.fit(x_train, y_train, epochs=50, callbacks=[history])
print("history:", history.losses)
cm.plot_history(h)
# cm.plot(history.losses, history.aucs)


test_predictions = model.predict(x_test)


# Compute confusion matrix
pred = np.argmax(test_predictions,axis=1)
pred2 = model.predict_classes(x_test)
confusion = confusion_matrix(y_test, pred)
cm.draw_confusion(confusion,range(10))


With its default parameters:





  • relu activation at hidden layers,


  • softmax at the output layer and


  • sparse_categorical_crossentropy as loss function,


it works fine and the prediction for all digits are above 99%



However with my parameters: tanh activation function and mean_squared_error loss function it just predict 0 for all test samples:



enter image description here



I wonder what is the problem? The accuracy rate is increasing for each epoch and it reaches 99% and loss is about 20










share|improve this question




















  • 1




    MSE is not an appropriate loss function for classification problems, as in your case; you may find this thread useful: What function defines accuracy in Keras when the loss is mean squared error (MSE)?
    – desertnaut
    Nov 19 at 11:04















up vote
0
down vote

favorite












I changed the getting started example of Tensorflow as following:



import tensorflow as tf
from sklearn.metrics import roc_auc_score
import numpy as np
import commons as cm
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sn

mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(512, activation=tf.nn.tanh),
# tf.keras.layers.Dense(512, activation=tf.nn.tanh),
# tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation=tf.nn.tanh)
])
model.compile(optimizer='adam',
loss='mean_squared_error',
# loss = 'sparse_categorical_crossentropy',
metrics=['accuracy'])

history = cm.Histories()
h= model.fit(x_train, y_train, epochs=50, callbacks=[history])
print("history:", history.losses)
cm.plot_history(h)
# cm.plot(history.losses, history.aucs)


test_predictions = model.predict(x_test)


# Compute confusion matrix
pred = np.argmax(test_predictions,axis=1)
pred2 = model.predict_classes(x_test)
confusion = confusion_matrix(y_test, pred)
cm.draw_confusion(confusion,range(10))


With its default parameters:





  • relu activation at hidden layers,


  • softmax at the output layer and


  • sparse_categorical_crossentropy as loss function,


it works fine and the prediction for all digits are above 99%



However with my parameters: tanh activation function and mean_squared_error loss function it just predict 0 for all test samples:



enter image description here



I wonder what is the problem? The accuracy rate is increasing for each epoch and it reaches 99% and loss is about 20










share|improve this question




















  • 1




    MSE is not an appropriate loss function for classification problems, as in your case; you may find this thread useful: What function defines accuracy in Keras when the loss is mean squared error (MSE)?
    – desertnaut
    Nov 19 at 11:04













up vote
0
down vote

favorite









up vote
0
down vote

favorite











I changed the getting started example of Tensorflow as following:



import tensorflow as tf
from sklearn.metrics import roc_auc_score
import numpy as np
import commons as cm
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sn

mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(512, activation=tf.nn.tanh),
# tf.keras.layers.Dense(512, activation=tf.nn.tanh),
# tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation=tf.nn.tanh)
])
model.compile(optimizer='adam',
loss='mean_squared_error',
# loss = 'sparse_categorical_crossentropy',
metrics=['accuracy'])

history = cm.Histories()
h= model.fit(x_train, y_train, epochs=50, callbacks=[history])
print("history:", history.losses)
cm.plot_history(h)
# cm.plot(history.losses, history.aucs)


test_predictions = model.predict(x_test)


# Compute confusion matrix
pred = np.argmax(test_predictions,axis=1)
pred2 = model.predict_classes(x_test)
confusion = confusion_matrix(y_test, pred)
cm.draw_confusion(confusion,range(10))


With its default parameters:





  • relu activation at hidden layers,


  • softmax at the output layer and


  • sparse_categorical_crossentropy as loss function,


it works fine and the prediction for all digits are above 99%



However with my parameters: tanh activation function and mean_squared_error loss function it just predict 0 for all test samples:



enter image description here



I wonder what is the problem? The accuracy rate is increasing for each epoch and it reaches 99% and loss is about 20










share|improve this question















I changed the getting started example of Tensorflow as following:



import tensorflow as tf
from sklearn.metrics import roc_auc_score
import numpy as np
import commons as cm
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sn

mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(512, activation=tf.nn.tanh),
# tf.keras.layers.Dense(512, activation=tf.nn.tanh),
# tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation=tf.nn.tanh)
])
model.compile(optimizer='adam',
loss='mean_squared_error',
# loss = 'sparse_categorical_crossentropy',
metrics=['accuracy'])

history = cm.Histories()
h= model.fit(x_train, y_train, epochs=50, callbacks=[history])
print("history:", history.losses)
cm.plot_history(h)
# cm.plot(history.losses, history.aucs)


test_predictions = model.predict(x_test)


# Compute confusion matrix
pred = np.argmax(test_predictions,axis=1)
pred2 = model.predict_classes(x_test)
confusion = confusion_matrix(y_test, pred)
cm.draw_confusion(confusion,range(10))


With its default parameters:





  • relu activation at hidden layers,


  • softmax at the output layer and


  • sparse_categorical_crossentropy as loss function,


it works fine and the prediction for all digits are above 99%



However with my parameters: tanh activation function and mean_squared_error loss function it just predict 0 for all test samples:



enter image description here



I wonder what is the problem? The accuracy rate is increasing for each epoch and it reaches 99% and loss is about 20







tensorflow machine-learning keras neural-network classification






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 2 days ago









blue-phoenox

3,08181438




3,08181438










asked Nov 19 at 9:27









Ahmad

2,67133057




2,67133057








  • 1




    MSE is not an appropriate loss function for classification problems, as in your case; you may find this thread useful: What function defines accuracy in Keras when the loss is mean squared error (MSE)?
    – desertnaut
    Nov 19 at 11:04














  • 1




    MSE is not an appropriate loss function for classification problems, as in your case; you may find this thread useful: What function defines accuracy in Keras when the loss is mean squared error (MSE)?
    – desertnaut
    Nov 19 at 11:04








1




1




MSE is not an appropriate loss function for classification problems, as in your case; you may find this thread useful: What function defines accuracy in Keras when the loss is mean squared error (MSE)?
– desertnaut
Nov 19 at 11:04




MSE is not an appropriate loss function for classification problems, as in your case; you may find this thread useful: What function defines accuracy in Keras when the loss is mean squared error (MSE)?
– desertnaut
Nov 19 at 11:04












1 Answer
1






active

oldest

votes

















up vote
1
down vote



accepted










You need to use the proper loss function for your data. Here you have a categorical output, so you need to use sparse_categorical_crossentropy, but also set from_logits without any activation for the last layer.



If you need to use tanh as your output, then you can use MSE with a one-hot encoded version of your labels + rescaling.






share|improve this answer























  • Thanks, but I had to use those functions and measure their performance. I think my mistake is that I should evaluate the categorial output in another way.
    – Ahmad
    Nov 19 at 11:37










  • Using tanh for a logits output doesn't make sense (it's not between 0 and 1, and the cost functions expect unbounded values). What do you mean by "had to use thse functions"? If you want to use MSE error, use a sigmoid output, clamp the categories at (1e-7, 1-1e-7) to avoid divergence and try again. But be aware that the results won't sum to one anymore.
    – Matthieu Brucher
    Nov 19 at 11:40










  • It's an assignment and those things are in the assignment definition, so I can't use other methods, unless they are equivalent with what I do. I think I reached a solution
    – Ahmad
    Nov 19 at 11:43










  • Change class then? Seems like this doesn't teach you the right practices.
    – Matthieu Brucher
    Nov 19 at 11:45






  • 1




    Glad that you could find the error!
    – Matthieu Brucher
    Nov 19 at 14:47











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














 

draft saved


draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53371650%2fmnist-classification-mean-squared-error-loss-function-and-tanh-activation-funct%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
1
down vote



accepted










You need to use the proper loss function for your data. Here you have a categorical output, so you need to use sparse_categorical_crossentropy, but also set from_logits without any activation for the last layer.



If you need to use tanh as your output, then you can use MSE with a one-hot encoded version of your labels + rescaling.






share|improve this answer























  • Thanks, but I had to use those functions and measure their performance. I think my mistake is that I should evaluate the categorial output in another way.
    – Ahmad
    Nov 19 at 11:37










  • Using tanh for a logits output doesn't make sense (it's not between 0 and 1, and the cost functions expect unbounded values). What do you mean by "had to use thse functions"? If you want to use MSE error, use a sigmoid output, clamp the categories at (1e-7, 1-1e-7) to avoid divergence and try again. But be aware that the results won't sum to one anymore.
    – Matthieu Brucher
    Nov 19 at 11:40










  • It's an assignment and those things are in the assignment definition, so I can't use other methods, unless they are equivalent with what I do. I think I reached a solution
    – Ahmad
    Nov 19 at 11:43










  • Change class then? Seems like this doesn't teach you the right practices.
    – Matthieu Brucher
    Nov 19 at 11:45






  • 1




    Glad that you could find the error!
    – Matthieu Brucher
    Nov 19 at 14:47















up vote
1
down vote



accepted










You need to use the proper loss function for your data. Here you have a categorical output, so you need to use sparse_categorical_crossentropy, but also set from_logits without any activation for the last layer.



If you need to use tanh as your output, then you can use MSE with a one-hot encoded version of your labels + rescaling.






share|improve this answer























  • Thanks, but I had to use those functions and measure their performance. I think my mistake is that I should evaluate the categorial output in another way.
    – Ahmad
    Nov 19 at 11:37










  • Using tanh for a logits output doesn't make sense (it's not between 0 and 1, and the cost functions expect unbounded values). What do you mean by "had to use thse functions"? If you want to use MSE error, use a sigmoid output, clamp the categories at (1e-7, 1-1e-7) to avoid divergence and try again. But be aware that the results won't sum to one anymore.
    – Matthieu Brucher
    Nov 19 at 11:40










  • It's an assignment and those things are in the assignment definition, so I can't use other methods, unless they are equivalent with what I do. I think I reached a solution
    – Ahmad
    Nov 19 at 11:43










  • Change class then? Seems like this doesn't teach you the right practices.
    – Matthieu Brucher
    Nov 19 at 11:45






  • 1




    Glad that you could find the error!
    – Matthieu Brucher
    Nov 19 at 14:47













up vote
1
down vote



accepted







up vote
1
down vote



accepted






You need to use the proper loss function for your data. Here you have a categorical output, so you need to use sparse_categorical_crossentropy, but also set from_logits without any activation for the last layer.



If you need to use tanh as your output, then you can use MSE with a one-hot encoded version of your labels + rescaling.






share|improve this answer














You need to use the proper loss function for your data. Here you have a categorical output, so you need to use sparse_categorical_crossentropy, but also set from_logits without any activation for the last layer.



If you need to use tanh as your output, then you can use MSE with a one-hot encoded version of your labels + rescaling.







share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 19 at 12:11

























answered Nov 19 at 10:35









Matthieu Brucher

6,7891331




6,7891331












  • Thanks, but I had to use those functions and measure their performance. I think my mistake is that I should evaluate the categorial output in another way.
    – Ahmad
    Nov 19 at 11:37










  • Using tanh for a logits output doesn't make sense (it's not between 0 and 1, and the cost functions expect unbounded values). What do you mean by "had to use thse functions"? If you want to use MSE error, use a sigmoid output, clamp the categories at (1e-7, 1-1e-7) to avoid divergence and try again. But be aware that the results won't sum to one anymore.
    – Matthieu Brucher
    Nov 19 at 11:40










  • It's an assignment and those things are in the assignment definition, so I can't use other methods, unless they are equivalent with what I do. I think I reached a solution
    – Ahmad
    Nov 19 at 11:43










  • Change class then? Seems like this doesn't teach you the right practices.
    – Matthieu Brucher
    Nov 19 at 11:45






  • 1




    Glad that you could find the error!
    – Matthieu Brucher
    Nov 19 at 14:47


















  • Thanks, but I had to use those functions and measure their performance. I think my mistake is that I should evaluate the categorial output in another way.
    – Ahmad
    Nov 19 at 11:37










  • Using tanh for a logits output doesn't make sense (it's not between 0 and 1, and the cost functions expect unbounded values). What do you mean by "had to use thse functions"? If you want to use MSE error, use a sigmoid output, clamp the categories at (1e-7, 1-1e-7) to avoid divergence and try again. But be aware that the results won't sum to one anymore.
    – Matthieu Brucher
    Nov 19 at 11:40










  • It's an assignment and those things are in the assignment definition, so I can't use other methods, unless they are equivalent with what I do. I think I reached a solution
    – Ahmad
    Nov 19 at 11:43










  • Change class then? Seems like this doesn't teach you the right practices.
    – Matthieu Brucher
    Nov 19 at 11:45






  • 1




    Glad that you could find the error!
    – Matthieu Brucher
    Nov 19 at 14:47
















Thanks, but I had to use those functions and measure their performance. I think my mistake is that I should evaluate the categorial output in another way.
– Ahmad
Nov 19 at 11:37




Thanks, but I had to use those functions and measure their performance. I think my mistake is that I should evaluate the categorial output in another way.
– Ahmad
Nov 19 at 11:37












Using tanh for a logits output doesn't make sense (it's not between 0 and 1, and the cost functions expect unbounded values). What do you mean by "had to use thse functions"? If you want to use MSE error, use a sigmoid output, clamp the categories at (1e-7, 1-1e-7) to avoid divergence and try again. But be aware that the results won't sum to one anymore.
– Matthieu Brucher
Nov 19 at 11:40




Using tanh for a logits output doesn't make sense (it's not between 0 and 1, and the cost functions expect unbounded values). What do you mean by "had to use thse functions"? If you want to use MSE error, use a sigmoid output, clamp the categories at (1e-7, 1-1e-7) to avoid divergence and try again. But be aware that the results won't sum to one anymore.
– Matthieu Brucher
Nov 19 at 11:40












It's an assignment and those things are in the assignment definition, so I can't use other methods, unless they are equivalent with what I do. I think I reached a solution
– Ahmad
Nov 19 at 11:43




It's an assignment and those things are in the assignment definition, so I can't use other methods, unless they are equivalent with what I do. I think I reached a solution
– Ahmad
Nov 19 at 11:43












Change class then? Seems like this doesn't teach you the right practices.
– Matthieu Brucher
Nov 19 at 11:45




Change class then? Seems like this doesn't teach you the right practices.
– Matthieu Brucher
Nov 19 at 11:45




1




1




Glad that you could find the error!
– Matthieu Brucher
Nov 19 at 14:47




Glad that you could find the error!
– Matthieu Brucher
Nov 19 at 14:47


















 

draft saved


draft discarded



















































 


draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53371650%2fmnist-classification-mean-squared-error-loss-function-and-tanh-activation-funct%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Can a sorcerer learn a 5th-level spell early by creating spell slots using the Font of Magic feature?

Does disintegrating a polymorphed enemy still kill it after the 2018 errata?

A Topological Invariant for $pi_3(U(n))$