MNIST Classification: mean_squared_error loss function and tanh activation function

up vote
0
down vote

favorite

I changed the getting started example of Tensorflow as following:

import tensorflow as tf

from sklearn.metrics import roc_auc_score

import numpy as np

import commons as cm

from sklearn.metrics import confusion_matrix

import matplotlib.pyplot as plt

import pandas as pd

import seaborn as sn



mnist = tf.keras.datasets.mnist



(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train, x_test = x_train / 255.0, x_test / 255.0



model = tf.keras.models.Sequential([

  tf.keras.layers.Flatten(),

  tf.keras.layers.Dense(512, activation=tf.nn.tanh),

  # tf.keras.layers.Dense(512, activation=tf.nn.tanh),

  # tf.keras.layers.Dropout(0.2),

  tf.keras.layers.Dense(10, activation=tf.nn.tanh)

])

model.compile(optimizer='adam',

               loss='mean_squared_error',

              # loss = 'sparse_categorical_crossentropy',

              metrics=['accuracy'])



history = cm.Histories()

h= model.fit(x_train, y_train, epochs=50, callbacks=[history])

print("history:", history.losses)

cm.plot_history(h)

# cm.plot(history.losses, history.aucs)





test_predictions = model.predict(x_test)





# Compute confusion matrix

pred = np.argmax(test_predictions,axis=1)

pred2 = model.predict_classes(x_test)

confusion = confusion_matrix(y_test, pred)

cm.draw_confusion(confusion,range(10))

With its default parameters:

relu activation at hidden layers,

softmax at the output layer and

sparse_categorical_crossentropy as loss function,

it works fine and the prediction for all digits are above 99%

However with my parameters: tanh activation function and mean_squared_error loss function it just predict 0 for all test samples:

enter image description here

I wonder what is the problem? The accuracy rate is increasing for each epoch and it reaches 99% and loss is about 20

edited 2 days ago

blue-phoenox

3,08181438

asked Nov 19 at 9:27

Ahmad

2,67133057

1

MSE is not an appropriate loss function for classification problems, as in your case; you may find this thread useful: What function defines accuracy in Keras when the loss is mean squared error (MSE)?
– desertnaut
Nov 19 at 11:04

add a comment |

up vote
0
down vote

favorite

I changed the getting started example of Tensorflow as following:

import tensorflow as tf

from sklearn.metrics import roc_auc_score

import numpy as np

import commons as cm

from sklearn.metrics import confusion_matrix

import matplotlib.pyplot as plt

import pandas as pd

import seaborn as sn



mnist = tf.keras.datasets.mnist



(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train, x_test = x_train / 255.0, x_test / 255.0



model = tf.keras.models.Sequential([

  tf.keras.layers.Flatten(),

  tf.keras.layers.Dense(512, activation=tf.nn.tanh),

  # tf.keras.layers.Dense(512, activation=tf.nn.tanh),

  # tf.keras.layers.Dropout(0.2),

  tf.keras.layers.Dense(10, activation=tf.nn.tanh)

])

model.compile(optimizer='adam',

               loss='mean_squared_error',

              # loss = 'sparse_categorical_crossentropy',

              metrics=['accuracy'])



history = cm.Histories()

h= model.fit(x_train, y_train, epochs=50, callbacks=[history])

print("history:", history.losses)

cm.plot_history(h)

# cm.plot(history.losses, history.aucs)





test_predictions = model.predict(x_test)





# Compute confusion matrix

pred = np.argmax(test_predictions,axis=1)

pred2 = model.predict_classes(x_test)

confusion = confusion_matrix(y_test, pred)

cm.draw_confusion(confusion,range(10))

With its default parameters:

relu activation at hidden layers,

softmax at the output layer and

sparse_categorical_crossentropy as loss function,

it works fine and the prediction for all digits are above 99%

However with my parameters: tanh activation function and mean_squared_error loss function it just predict 0 for all test samples:

enter image description here

I wonder what is the problem? The accuracy rate is increasing for each epoch and it reaches 99% and loss is about 20

edited 2 days ago

blue-phoenox

3,08181438

asked Nov 19 at 9:27

Ahmad

2,67133057

1

MSE is not an appropriate loss function for classification problems, as in your case; you may find this thread useful: What function defines accuracy in Keras when the loss is mean squared error (MSE)?
– desertnaut
Nov 19 at 11:04

add a comment |

up vote
0
down vote

favorite

I changed the getting started example of Tensorflow as following:

import tensorflow as tf

from sklearn.metrics import roc_auc_score

import numpy as np

import commons as cm

from sklearn.metrics import confusion_matrix

import matplotlib.pyplot as plt

import pandas as pd

import seaborn as sn



mnist = tf.keras.datasets.mnist



(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train, x_test = x_train / 255.0, x_test / 255.0



model = tf.keras.models.Sequential([

  tf.keras.layers.Flatten(),

  tf.keras.layers.Dense(512, activation=tf.nn.tanh),

  # tf.keras.layers.Dense(512, activation=tf.nn.tanh),

  # tf.keras.layers.Dropout(0.2),

  tf.keras.layers.Dense(10, activation=tf.nn.tanh)

])

model.compile(optimizer='adam',

               loss='mean_squared_error',

              # loss = 'sparse_categorical_crossentropy',

              metrics=['accuracy'])



history = cm.Histories()

h= model.fit(x_train, y_train, epochs=50, callbacks=[history])

print("history:", history.losses)

cm.plot_history(h)

# cm.plot(history.losses, history.aucs)





test_predictions = model.predict(x_test)





# Compute confusion matrix

pred = np.argmax(test_predictions,axis=1)

pred2 = model.predict_classes(x_test)

confusion = confusion_matrix(y_test, pred)

cm.draw_confusion(confusion,range(10))

With its default parameters:

relu activation at hidden layers,

softmax at the output layer and

sparse_categorical_crossentropy as loss function,

it works fine and the prediction for all digits are above 99%

However with my parameters: tanh activation function and mean_squared_error loss function it just predict 0 for all test samples:

enter image description here

I wonder what is the problem? The accuracy rate is increasing for each epoch and it reaches 99% and loss is about 20

edited 2 days ago

blue-phoenox

3,08181438

asked Nov 19 at 9:27

Ahmad

2,67133057

I changed the getting started example of Tensorflow as following:

import tensorflow as tf

from sklearn.metrics import roc_auc_score

import numpy as np

import commons as cm

from sklearn.metrics import confusion_matrix

import matplotlib.pyplot as plt

import pandas as pd

import seaborn as sn



mnist = tf.keras.datasets.mnist



(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train, x_test = x_train / 255.0, x_test / 255.0



model = tf.keras.models.Sequential([

  tf.keras.layers.Flatten(),

  tf.keras.layers.Dense(512, activation=tf.nn.tanh),

  # tf.keras.layers.Dense(512, activation=tf.nn.tanh),

  # tf.keras.layers.Dropout(0.2),

  tf.keras.layers.Dense(10, activation=tf.nn.tanh)

])

model.compile(optimizer='adam',

               loss='mean_squared_error',

              # loss = 'sparse_categorical_crossentropy',

              metrics=['accuracy'])



history = cm.Histories()

h= model.fit(x_train, y_train, epochs=50, callbacks=[history])

print("history:", history.losses)

cm.plot_history(h)

# cm.plot(history.losses, history.aucs)





test_predictions = model.predict(x_test)





# Compute confusion matrix

pred = np.argmax(test_predictions,axis=1)

pred2 = model.predict_classes(x_test)

confusion = confusion_matrix(y_test, pred)

cm.draw_confusion(confusion,range(10))

With its default parameters:

relu activation at hidden layers,

softmax at the output layer and

sparse_categorical_crossentropy as loss function,

it works fine and the prediction for all digits are above 99%

However with my parameters: tanh activation function and mean_squared_error loss function it just predict 0 for all test samples:

enter image description here

I wonder what is the problem? The accuracy rate is increasing for each epoch and it reaches 99% and loss is about 20

tensorflow machine-learning keras neural-network classification

edited 2 days ago

blue-phoenox

3,08181438

asked Nov 19 at 9:27

Ahmad

2,67133057

edited 2 days ago

blue-phoenox

3,08181438

asked Nov 19 at 9:27

Ahmad

2,67133057

edited 2 days ago

blue-phoenox

3,08181438

edited 2 days ago

blue-phoenox

3,08181438

edited 2 days ago

blue-phoenox

3,08181438

asked Nov 19 at 9:27

Ahmad

2,67133057

asked Nov 19 at 9:27

Ahmad

2,67133057

asked Nov 19 at 9:27

Ahmad

2,67133057

1

MSE is not an appropriate loss function for classification problems, as in your case; you may find this thread useful: What function defines accuracy in Keras when the loss is mean squared error (MSE)?
– desertnaut
Nov 19 at 11:04

add a comment |

1

MSE is not an appropriate loss function for classification problems, as in your case; you may find this thread useful: What function defines accuracy in Keras when the loss is mean squared error (MSE)?
– desertnaut
Nov 19 at 11:04

MSE is not an appropriate loss function for classification problems, as in your case; you may find this thread useful: What function defines accuracy in Keras when the loss is mean squared error (MSE)?
– desertnaut
Nov 19 at 11:04

add a comment |

1 Answer
1

active

oldest

votes

up vote
1
down vote

accepted

You need to use the proper loss function for your data. Here you have a categorical output, so you need to use sparse_categorical_crossentropy, but also set from_logits without any activation for the last layer.

If you need to use tanh as your output, then you can use MSE with a one-hot encoded version of your labels + rescaling.

edited Nov 19 at 12:11

answered Nov 19 at 10:35

Matthieu Brucher

6,7891331

Thanks, but I had to use those functions and measure their performance. I think my mistake is that I should evaluate the categorial output in another way.
– Ahmad
Nov 19 at 11:37

Using tanh for a logits output doesn't make sense (it's not between 0 and 1, and the cost functions expect unbounded values). What do you mean by "had to use thse functions"? If you want to use MSE error, use a sigmoid output, clamp the categories at (1e-7, 1-1e-7) to avoid divergence and try again. But be aware that the results won't sum to one anymore.
– Matthieu Brucher
Nov 19 at 11:40

It's an assignment and those things are in the assignment definition, so I can't use other methods, unless they are equivalent with what I do. I think I reached a solution
– Ahmad
Nov 19 at 11:43

Change class then? Seems like this doesn't teach you the right practices.
– Matthieu Brucher
Nov 19 at 11:45

1

Glad that you could find the error!
– Matthieu Brucher
Nov 19 at 14:47

|
show 13 more comments

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53371650%2fmnist-classification-mean-squared-error-loss-function-and-tanh-activation-funct%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
1
down vote

accepted

If you need to use tanh as your output, then you can use MSE with a one-hot encoded version of your labels + rescaling.

edited Nov 19 at 12:11

answered Nov 19 at 10:35

Matthieu Brucher

6,7891331

Thanks, but I had to use those functions and measure their performance. I think my mistake is that I should evaluate the categorial output in another way.
– Ahmad
Nov 19 at 11:37

Using tanh for a logits output doesn't make sense (it's not between 0 and 1, and the cost functions expect unbounded values). What do you mean by "had to use thse functions"? If you want to use MSE error, use a sigmoid output, clamp the categories at (1e-7, 1-1e-7) to avoid divergence and try again. But be aware that the results won't sum to one anymore.
– Matthieu Brucher
Nov 19 at 11:40

It's an assignment and those things are in the assignment definition, so I can't use other methods, unless they are equivalent with what I do. I think I reached a solution
– Ahmad
Nov 19 at 11:43

Change class then? Seems like this doesn't teach you the right practices.
– Matthieu Brucher
Nov 19 at 11:45

1

Glad that you could find the error!
– Matthieu Brucher
Nov 19 at 14:47

|
show 13 more comments

up vote
1
down vote

accepted

If you need to use tanh as your output, then you can use MSE with a one-hot encoded version of your labels + rescaling.

edited Nov 19 at 12:11

answered Nov 19 at 10:35

Matthieu Brucher

6,7891331

Thanks, but I had to use those functions and measure their performance. I think my mistake is that I should evaluate the categorial output in another way.
– Ahmad
Nov 19 at 11:37

Using tanh for a logits output doesn't make sense (it's not between 0 and 1, and the cost functions expect unbounded values). What do you mean by "had to use thse functions"? If you want to use MSE error, use a sigmoid output, clamp the categories at (1e-7, 1-1e-7) to avoid divergence and try again. But be aware that the results won't sum to one anymore.
– Matthieu Brucher
Nov 19 at 11:40

It's an assignment and those things are in the assignment definition, so I can't use other methods, unless they are equivalent with what I do. I think I reached a solution
– Ahmad
Nov 19 at 11:43

Change class then? Seems like this doesn't teach you the right practices.
– Matthieu Brucher
Nov 19 at 11:45

1

Glad that you could find the error!
– Matthieu Brucher
Nov 19 at 14:47

|
show 13 more comments

up vote
1
down vote

accepted

If you need to use tanh as your output, then you can use MSE with a one-hot encoded version of your labels + rescaling.

edited Nov 19 at 12:11

answered Nov 19 at 10:35

Matthieu Brucher

6,7891331

If you need to use tanh as your output, then you can use MSE with a one-hot encoded version of your labels + rescaling.

edited Nov 19 at 12:11

answered Nov 19 at 10:35

Matthieu Brucher

6,7891331

edited Nov 19 at 12:11

answered Nov 19 at 10:35

Matthieu Brucher

6,7891331

answered Nov 19 at 10:35

Matthieu Brucher

6,7891331

answered Nov 19 at 10:35

Matthieu Brucher

6,7891331

Thanks, but I had to use those functions and measure their performance. I think my mistake is that I should evaluate the categorial output in another way.
– Ahmad
Nov 19 at 11:37

Using tanh for a logits output doesn't make sense (it's not between 0 and 1, and the cost functions expect unbounded values). What do you mean by "had to use thse functions"? If you want to use MSE error, use a sigmoid output, clamp the categories at (1e-7, 1-1e-7) to avoid divergence and try again. But be aware that the results won't sum to one anymore.
– Matthieu Brucher
Nov 19 at 11:40

It's an assignment and those things are in the assignment definition, so I can't use other methods, unless they are equivalent with what I do. I think I reached a solution
– Ahmad
Nov 19 at 11:43

Change class then? Seems like this doesn't teach you the right practices.
– Matthieu Brucher
Nov 19 at 11:45

1

Glad that you could find the error!
– Matthieu Brucher
Nov 19 at 14:47

|
show 13 more comments

Thanks, but I had to use those functions and measure their performance. I think my mistake is that I should evaluate the categorial output in another way.
– Ahmad
Nov 19 at 11:37

Using tanh for a logits output doesn't make sense (it's not between 0 and 1, and the cost functions expect unbounded values). What do you mean by "had to use thse functions"? If you want to use MSE error, use a sigmoid output, clamp the categories at (1e-7, 1-1e-7) to avoid divergence and try again. But be aware that the results won't sum to one anymore.
– Matthieu Brucher
Nov 19 at 11:40

It's an assignment and those things are in the assignment definition, so I can't use other methods, unless they are equivalent with what I do. I think I reached a solution
– Ahmad
Nov 19 at 11:43

Change class then? Seems like this doesn't teach you the right practices.
– Matthieu Brucher
Nov 19 at 11:45

1

Glad that you could find the error!
– Matthieu Brucher
Nov 19 at 14:47

Thanks, but I had to use those functions and measure their performance. I think my mistake is that I should evaluate the categorial output in another way.
– Ahmad
Nov 19 at 11:37

Using tanh for a logits output doesn't make sense (it's not between 0 and 1, and the cost functions expect unbounded values). What do you mean by "had to use thse functions"? If you want to use MSE error, use a sigmoid output, clamp the categories at (1e-7, 1-1e-7) to avoid divergence and try again. But be aware that the results won't sum to one anymore.
– Matthieu Brucher
Nov 19 at 11:40

It's an assignment and those things are in the assignment definition, so I can't use other methods, unless they are equivalent with what I do. I think I reached a solution
– Ahmad
Nov 19 at 11:43

Change class then? Seems like this doesn't teach you the right practices.
– Matthieu Brucher
Nov 19 at 11:45

Glad that you could find the error!
– Matthieu Brucher
Nov 19 at 14:47

|
show 13 more comments

draft saved

draft discarded

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

Search This Blog

Ufyukyu