MNIST Classification: mean_squared_error loss function and tanh activation function
up vote
0
down vote
favorite
I changed the getting started example of Tensorflow as following:
import tensorflow as tf
from sklearn.metrics import roc_auc_score
import numpy as np
import commons as cm
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sn
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(512, activation=tf.nn.tanh),
# tf.keras.layers.Dense(512, activation=tf.nn.tanh),
# tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation=tf.nn.tanh)
])
model.compile(optimizer='adam',
loss='mean_squared_error',
# loss = 'sparse_categorical_crossentropy',
metrics=['accuracy'])
history = cm.Histories()
h= model.fit(x_train, y_train, epochs=50, callbacks=[history])
print("history:", history.losses)
cm.plot_history(h)
# cm.plot(history.losses, history.aucs)
test_predictions = model.predict(x_test)
# Compute confusion matrix
pred = np.argmax(test_predictions,axis=1)
pred2 = model.predict_classes(x_test)
confusion = confusion_matrix(y_test, pred)
cm.draw_confusion(confusion,range(10))
With its default parameters:
relu
activation at hidden layers,
softmax
at the output layer and
sparse_categorical_crossentropy
as loss function,
it works fine and the prediction for all digits are above 99%
However with my parameters: tanh
activation function and mean_squared_error
loss function it just predict 0
for all test samples:
I wonder what is the problem? The accuracy rate is increasing for each epoch and it reaches 99% and loss is about 20
tensorflow machine-learning keras neural-network classification
add a comment |
up vote
0
down vote
favorite
I changed the getting started example of Tensorflow as following:
import tensorflow as tf
from sklearn.metrics import roc_auc_score
import numpy as np
import commons as cm
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sn
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(512, activation=tf.nn.tanh),
# tf.keras.layers.Dense(512, activation=tf.nn.tanh),
# tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation=tf.nn.tanh)
])
model.compile(optimizer='adam',
loss='mean_squared_error',
# loss = 'sparse_categorical_crossentropy',
metrics=['accuracy'])
history = cm.Histories()
h= model.fit(x_train, y_train, epochs=50, callbacks=[history])
print("history:", history.losses)
cm.plot_history(h)
# cm.plot(history.losses, history.aucs)
test_predictions = model.predict(x_test)
# Compute confusion matrix
pred = np.argmax(test_predictions,axis=1)
pred2 = model.predict_classes(x_test)
confusion = confusion_matrix(y_test, pred)
cm.draw_confusion(confusion,range(10))
With its default parameters:
relu
activation at hidden layers,
softmax
at the output layer and
sparse_categorical_crossentropy
as loss function,
it works fine and the prediction for all digits are above 99%
However with my parameters: tanh
activation function and mean_squared_error
loss function it just predict 0
for all test samples:
I wonder what is the problem? The accuracy rate is increasing for each epoch and it reaches 99% and loss is about 20
tensorflow machine-learning keras neural-network classification
1
MSE is not an appropriate loss function for classification problems, as in your case; you may find this thread useful: What function defines accuracy in Keras when the loss is mean squared error (MSE)?
– desertnaut
Nov 19 at 11:04
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I changed the getting started example of Tensorflow as following:
import tensorflow as tf
from sklearn.metrics import roc_auc_score
import numpy as np
import commons as cm
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sn
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(512, activation=tf.nn.tanh),
# tf.keras.layers.Dense(512, activation=tf.nn.tanh),
# tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation=tf.nn.tanh)
])
model.compile(optimizer='adam',
loss='mean_squared_error',
# loss = 'sparse_categorical_crossentropy',
metrics=['accuracy'])
history = cm.Histories()
h= model.fit(x_train, y_train, epochs=50, callbacks=[history])
print("history:", history.losses)
cm.plot_history(h)
# cm.plot(history.losses, history.aucs)
test_predictions = model.predict(x_test)
# Compute confusion matrix
pred = np.argmax(test_predictions,axis=1)
pred2 = model.predict_classes(x_test)
confusion = confusion_matrix(y_test, pred)
cm.draw_confusion(confusion,range(10))
With its default parameters:
relu
activation at hidden layers,
softmax
at the output layer and
sparse_categorical_crossentropy
as loss function,
it works fine and the prediction for all digits are above 99%
However with my parameters: tanh
activation function and mean_squared_error
loss function it just predict 0
for all test samples:
I wonder what is the problem? The accuracy rate is increasing for each epoch and it reaches 99% and loss is about 20
tensorflow machine-learning keras neural-network classification
I changed the getting started example of Tensorflow as following:
import tensorflow as tf
from sklearn.metrics import roc_auc_score
import numpy as np
import commons as cm
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sn
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(512, activation=tf.nn.tanh),
# tf.keras.layers.Dense(512, activation=tf.nn.tanh),
# tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation=tf.nn.tanh)
])
model.compile(optimizer='adam',
loss='mean_squared_error',
# loss = 'sparse_categorical_crossentropy',
metrics=['accuracy'])
history = cm.Histories()
h= model.fit(x_train, y_train, epochs=50, callbacks=[history])
print("history:", history.losses)
cm.plot_history(h)
# cm.plot(history.losses, history.aucs)
test_predictions = model.predict(x_test)
# Compute confusion matrix
pred = np.argmax(test_predictions,axis=1)
pred2 = model.predict_classes(x_test)
confusion = confusion_matrix(y_test, pred)
cm.draw_confusion(confusion,range(10))
With its default parameters:
relu
activation at hidden layers,
softmax
at the output layer and
sparse_categorical_crossentropy
as loss function,
it works fine and the prediction for all digits are above 99%
However with my parameters: tanh
activation function and mean_squared_error
loss function it just predict 0
for all test samples:
I wonder what is the problem? The accuracy rate is increasing for each epoch and it reaches 99% and loss is about 20
tensorflow machine-learning keras neural-network classification
tensorflow machine-learning keras neural-network classification
edited 2 days ago
blue-phoenox
3,08181438
3,08181438
asked Nov 19 at 9:27
Ahmad
2,67133057
2,67133057
1
MSE is not an appropriate loss function for classification problems, as in your case; you may find this thread useful: What function defines accuracy in Keras when the loss is mean squared error (MSE)?
– desertnaut
Nov 19 at 11:04
add a comment |
1
MSE is not an appropriate loss function for classification problems, as in your case; you may find this thread useful: What function defines accuracy in Keras when the loss is mean squared error (MSE)?
– desertnaut
Nov 19 at 11:04
1
1
MSE is not an appropriate loss function for classification problems, as in your case; you may find this thread useful: What function defines accuracy in Keras when the loss is mean squared error (MSE)?
– desertnaut
Nov 19 at 11:04
MSE is not an appropriate loss function for classification problems, as in your case; you may find this thread useful: What function defines accuracy in Keras when the loss is mean squared error (MSE)?
– desertnaut
Nov 19 at 11:04
add a comment |
1 Answer
1
active
oldest
votes
up vote
1
down vote
accepted
You need to use the proper loss function for your data. Here you have a categorical output, so you need to use sparse_categorical_crossentropy
, but also set from_logits
without any activation for the last layer.
If you need to use tanh
as your output, then you can use MSE with a one-hot encoded version of your labels + rescaling.
Thanks, but I had to use those functions and measure their performance. I think my mistake is that I should evaluate the categorial output in another way.
– Ahmad
Nov 19 at 11:37
Usingtanh
for a logits output doesn't make sense (it's not between 0 and 1, and the cost functions expect unbounded values). What do you mean by "had to use thse functions"? If you want to use MSE error, use a sigmoid output, clamp the categories at (1e-7, 1-1e-7) to avoid divergence and try again. But be aware that the results won't sum to one anymore.
– Matthieu Brucher
Nov 19 at 11:40
It's an assignment and those things are in the assignment definition, so I can't use other methods, unless they are equivalent with what I do. I think I reached a solution
– Ahmad
Nov 19 at 11:43
Change class then? Seems like this doesn't teach you the right practices.
– Matthieu Brucher
Nov 19 at 11:45
1
Glad that you could find the error!
– Matthieu Brucher
Nov 19 at 14:47
|
show 13 more comments
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
You need to use the proper loss function for your data. Here you have a categorical output, so you need to use sparse_categorical_crossentropy
, but also set from_logits
without any activation for the last layer.
If you need to use tanh
as your output, then you can use MSE with a one-hot encoded version of your labels + rescaling.
Thanks, but I had to use those functions and measure their performance. I think my mistake is that I should evaluate the categorial output in another way.
– Ahmad
Nov 19 at 11:37
Usingtanh
for a logits output doesn't make sense (it's not between 0 and 1, and the cost functions expect unbounded values). What do you mean by "had to use thse functions"? If you want to use MSE error, use a sigmoid output, clamp the categories at (1e-7, 1-1e-7) to avoid divergence and try again. But be aware that the results won't sum to one anymore.
– Matthieu Brucher
Nov 19 at 11:40
It's an assignment and those things are in the assignment definition, so I can't use other methods, unless they are equivalent with what I do. I think I reached a solution
– Ahmad
Nov 19 at 11:43
Change class then? Seems like this doesn't teach you the right practices.
– Matthieu Brucher
Nov 19 at 11:45
1
Glad that you could find the error!
– Matthieu Brucher
Nov 19 at 14:47
|
show 13 more comments
up vote
1
down vote
accepted
You need to use the proper loss function for your data. Here you have a categorical output, so you need to use sparse_categorical_crossentropy
, but also set from_logits
without any activation for the last layer.
If you need to use tanh
as your output, then you can use MSE with a one-hot encoded version of your labels + rescaling.
Thanks, but I had to use those functions and measure their performance. I think my mistake is that I should evaluate the categorial output in another way.
– Ahmad
Nov 19 at 11:37
Usingtanh
for a logits output doesn't make sense (it's not between 0 and 1, and the cost functions expect unbounded values). What do you mean by "had to use thse functions"? If you want to use MSE error, use a sigmoid output, clamp the categories at (1e-7, 1-1e-7) to avoid divergence and try again. But be aware that the results won't sum to one anymore.
– Matthieu Brucher
Nov 19 at 11:40
It's an assignment and those things are in the assignment definition, so I can't use other methods, unless they are equivalent with what I do. I think I reached a solution
– Ahmad
Nov 19 at 11:43
Change class then? Seems like this doesn't teach you the right practices.
– Matthieu Brucher
Nov 19 at 11:45
1
Glad that you could find the error!
– Matthieu Brucher
Nov 19 at 14:47
|
show 13 more comments
up vote
1
down vote
accepted
up vote
1
down vote
accepted
You need to use the proper loss function for your data. Here you have a categorical output, so you need to use sparse_categorical_crossentropy
, but also set from_logits
without any activation for the last layer.
If you need to use tanh
as your output, then you can use MSE with a one-hot encoded version of your labels + rescaling.
You need to use the proper loss function for your data. Here you have a categorical output, so you need to use sparse_categorical_crossentropy
, but also set from_logits
without any activation for the last layer.
If you need to use tanh
as your output, then you can use MSE with a one-hot encoded version of your labels + rescaling.
edited Nov 19 at 12:11
answered Nov 19 at 10:35
Matthieu Brucher
6,7891331
6,7891331
Thanks, but I had to use those functions and measure their performance. I think my mistake is that I should evaluate the categorial output in another way.
– Ahmad
Nov 19 at 11:37
Usingtanh
for a logits output doesn't make sense (it's not between 0 and 1, and the cost functions expect unbounded values). What do you mean by "had to use thse functions"? If you want to use MSE error, use a sigmoid output, clamp the categories at (1e-7, 1-1e-7) to avoid divergence and try again. But be aware that the results won't sum to one anymore.
– Matthieu Brucher
Nov 19 at 11:40
It's an assignment and those things are in the assignment definition, so I can't use other methods, unless they are equivalent with what I do. I think I reached a solution
– Ahmad
Nov 19 at 11:43
Change class then? Seems like this doesn't teach you the right practices.
– Matthieu Brucher
Nov 19 at 11:45
1
Glad that you could find the error!
– Matthieu Brucher
Nov 19 at 14:47
|
show 13 more comments
Thanks, but I had to use those functions and measure their performance. I think my mistake is that I should evaluate the categorial output in another way.
– Ahmad
Nov 19 at 11:37
Usingtanh
for a logits output doesn't make sense (it's not between 0 and 1, and the cost functions expect unbounded values). What do you mean by "had to use thse functions"? If you want to use MSE error, use a sigmoid output, clamp the categories at (1e-7, 1-1e-7) to avoid divergence and try again. But be aware that the results won't sum to one anymore.
– Matthieu Brucher
Nov 19 at 11:40
It's an assignment and those things are in the assignment definition, so I can't use other methods, unless they are equivalent with what I do. I think I reached a solution
– Ahmad
Nov 19 at 11:43
Change class then? Seems like this doesn't teach you the right practices.
– Matthieu Brucher
Nov 19 at 11:45
1
Glad that you could find the error!
– Matthieu Brucher
Nov 19 at 14:47
Thanks, but I had to use those functions and measure their performance. I think my mistake is that I should evaluate the categorial output in another way.
– Ahmad
Nov 19 at 11:37
Thanks, but I had to use those functions and measure their performance. I think my mistake is that I should evaluate the categorial output in another way.
– Ahmad
Nov 19 at 11:37
Using
tanh
for a logits output doesn't make sense (it's not between 0 and 1, and the cost functions expect unbounded values). What do you mean by "had to use thse functions"? If you want to use MSE error, use a sigmoid output, clamp the categories at (1e-7, 1-1e-7) to avoid divergence and try again. But be aware that the results won't sum to one anymore.– Matthieu Brucher
Nov 19 at 11:40
Using
tanh
for a logits output doesn't make sense (it's not between 0 and 1, and the cost functions expect unbounded values). What do you mean by "had to use thse functions"? If you want to use MSE error, use a sigmoid output, clamp the categories at (1e-7, 1-1e-7) to avoid divergence and try again. But be aware that the results won't sum to one anymore.– Matthieu Brucher
Nov 19 at 11:40
It's an assignment and those things are in the assignment definition, so I can't use other methods, unless they are equivalent with what I do. I think I reached a solution
– Ahmad
Nov 19 at 11:43
It's an assignment and those things are in the assignment definition, so I can't use other methods, unless they are equivalent with what I do. I think I reached a solution
– Ahmad
Nov 19 at 11:43
Change class then? Seems like this doesn't teach you the right practices.
– Matthieu Brucher
Nov 19 at 11:45
Change class then? Seems like this doesn't teach you the right practices.
– Matthieu Brucher
Nov 19 at 11:45
1
1
Glad that you could find the error!
– Matthieu Brucher
Nov 19 at 14:47
Glad that you could find the error!
– Matthieu Brucher
Nov 19 at 14:47
|
show 13 more comments
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53371650%2fmnist-classification-mean-squared-error-loss-function-and-tanh-activation-funct%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
MSE is not an appropriate loss function for classification problems, as in your case; you may find this thread useful: What function defines accuracy in Keras when the loss is mean squared error (MSE)?
– desertnaut
Nov 19 at 11:04