Why does cross_val_score in sklearn flip the value of the metric?












0














I am fitting this model from sklearn.



LogisticRegressionCV(
solver="sag", scoring="neg_log_loss", verbose=0, n_jobs=-1, cv=10
)


The fitting results in a model.score (on training set) of 0.67 and change. Since there is no way (or I don't know how) to access the results of the cross validation performed as part of the model fitting, I run as separate cross validation on the same model with



cross_val_score(model, X, y, cv=10, scoring="neg_log_loss")


This returns an array of negative numbers



[-0.69517214 -0.69211235 -0.64173978 -0.66429986 -0.77126878 -0.65127196
-0.66302393 -0.65916281 -0.66893633 -0.67605681]


which, if signs were flipped, would seem in a range compatible with the training score.
I've read the discussion in an issue about cross_val_score flipping the sign of the given scoring function and the solution seemed that neg_* metrics were being introduced to make such flipping unnecessary and I am using neg_log_loss. The issue talks about mse but the arguments seem to apply to log_loss as well. Is there a way to have cross_val_score return the same metric as specified in its arguments? Or is this a bug I should file? Or a misunderstanding on my part and sign change is still to be expected from cross_val_score?



I hope this is a specific enough question for SO. Sklearn devs redirect users to SO for questions that are not clear-cut bug reports or feature reqs.



Adding minimal repro code per request in comments (sklearn v 0.19.1 python 2.7):



from numpy.random import randn, seed
from sklearn.linear_model import LogisticRegressionCV
from sklearn.model_selection import cross_val_score

seed (0)
X = randn(100,2)
y = randn(100)>0
model = LogisticRegressionCV(
solver="sag", scoring="neg_log_loss", verbose=0, n_jobs=-1, cv=10
)
model.fit(X=X, y=y)
model.score(X,y)

cross_val_score(model, X, y, cv=10, scoring="neg_log_loss")


With this code, it doesn't look anymore like it's a simple sign flip for the metric. The outputs are 0.59 for the score and array([-0.70578452, -0.68773683, -0.68627652, -0.69731349, -0.69198876, -0.70089103, -0.69476663, -0.68279466, -0.70066003, -0.68532253]) for the cross validation score.










share|improve this question
























  • Can you show the complete code and possibly some data which reproduces positive score when model.score()? I am not able to duplicate it on scikit-learn inbuilt datasets.
    – Vivek Kumar
    Nov 20 '18 at 6:49










  • The complete code is at github.com/piccolbo/rightload branch basilica The ML code is in ml.py Sharing the data is more complex and running the code requires access to a web service. I need to think of something more self-contained for a more practical repro.
    – piccolbo
    Nov 20 '18 at 16:17












  • The code that generates the positive score is pretty trivial, in ml.py:127 and following lines. model.fit(X,y) followed by model.score(X,y), pretty much. I hope I got your question -- I still owe you some data for a complete repro, of course.
    – piccolbo
    Nov 20 '18 at 16:25










  • Got the repro but it requires sharing two pickles with data. Is there a SO preferred way of doing that?
    – piccolbo
    Nov 20 '18 at 19:11










  • Replaced repro with one that is self-contained and quick. Doesn't look like a simple sign flip anymore, though.
    – piccolbo
    Nov 20 '18 at 20:10


















0














I am fitting this model from sklearn.



LogisticRegressionCV(
solver="sag", scoring="neg_log_loss", verbose=0, n_jobs=-1, cv=10
)


The fitting results in a model.score (on training set) of 0.67 and change. Since there is no way (or I don't know how) to access the results of the cross validation performed as part of the model fitting, I run as separate cross validation on the same model with



cross_val_score(model, X, y, cv=10, scoring="neg_log_loss")


This returns an array of negative numbers



[-0.69517214 -0.69211235 -0.64173978 -0.66429986 -0.77126878 -0.65127196
-0.66302393 -0.65916281 -0.66893633 -0.67605681]


which, if signs were flipped, would seem in a range compatible with the training score.
I've read the discussion in an issue about cross_val_score flipping the sign of the given scoring function and the solution seemed that neg_* metrics were being introduced to make such flipping unnecessary and I am using neg_log_loss. The issue talks about mse but the arguments seem to apply to log_loss as well. Is there a way to have cross_val_score return the same metric as specified in its arguments? Or is this a bug I should file? Or a misunderstanding on my part and sign change is still to be expected from cross_val_score?



I hope this is a specific enough question for SO. Sklearn devs redirect users to SO for questions that are not clear-cut bug reports or feature reqs.



Adding minimal repro code per request in comments (sklearn v 0.19.1 python 2.7):



from numpy.random import randn, seed
from sklearn.linear_model import LogisticRegressionCV
from sklearn.model_selection import cross_val_score

seed (0)
X = randn(100,2)
y = randn(100)>0
model = LogisticRegressionCV(
solver="sag", scoring="neg_log_loss", verbose=0, n_jobs=-1, cv=10
)
model.fit(X=X, y=y)
model.score(X,y)

cross_val_score(model, X, y, cv=10, scoring="neg_log_loss")


With this code, it doesn't look anymore like it's a simple sign flip for the metric. The outputs are 0.59 for the score and array([-0.70578452, -0.68773683, -0.68627652, -0.69731349, -0.69198876, -0.70089103, -0.69476663, -0.68279466, -0.70066003, -0.68532253]) for the cross validation score.










share|improve this question
























  • Can you show the complete code and possibly some data which reproduces positive score when model.score()? I am not able to duplicate it on scikit-learn inbuilt datasets.
    – Vivek Kumar
    Nov 20 '18 at 6:49










  • The complete code is at github.com/piccolbo/rightload branch basilica The ML code is in ml.py Sharing the data is more complex and running the code requires access to a web service. I need to think of something more self-contained for a more practical repro.
    – piccolbo
    Nov 20 '18 at 16:17












  • The code that generates the positive score is pretty trivial, in ml.py:127 and following lines. model.fit(X,y) followed by model.score(X,y), pretty much. I hope I got your question -- I still owe you some data for a complete repro, of course.
    – piccolbo
    Nov 20 '18 at 16:25










  • Got the repro but it requires sharing two pickles with data. Is there a SO preferred way of doing that?
    – piccolbo
    Nov 20 '18 at 19:11










  • Replaced repro with one that is self-contained and quick. Doesn't look like a simple sign flip anymore, though.
    – piccolbo
    Nov 20 '18 at 20:10
















0












0








0







I am fitting this model from sklearn.



LogisticRegressionCV(
solver="sag", scoring="neg_log_loss", verbose=0, n_jobs=-1, cv=10
)


The fitting results in a model.score (on training set) of 0.67 and change. Since there is no way (or I don't know how) to access the results of the cross validation performed as part of the model fitting, I run as separate cross validation on the same model with



cross_val_score(model, X, y, cv=10, scoring="neg_log_loss")


This returns an array of negative numbers



[-0.69517214 -0.69211235 -0.64173978 -0.66429986 -0.77126878 -0.65127196
-0.66302393 -0.65916281 -0.66893633 -0.67605681]


which, if signs were flipped, would seem in a range compatible with the training score.
I've read the discussion in an issue about cross_val_score flipping the sign of the given scoring function and the solution seemed that neg_* metrics were being introduced to make such flipping unnecessary and I am using neg_log_loss. The issue talks about mse but the arguments seem to apply to log_loss as well. Is there a way to have cross_val_score return the same metric as specified in its arguments? Or is this a bug I should file? Or a misunderstanding on my part and sign change is still to be expected from cross_val_score?



I hope this is a specific enough question for SO. Sklearn devs redirect users to SO for questions that are not clear-cut bug reports or feature reqs.



Adding minimal repro code per request in comments (sklearn v 0.19.1 python 2.7):



from numpy.random import randn, seed
from sklearn.linear_model import LogisticRegressionCV
from sklearn.model_selection import cross_val_score

seed (0)
X = randn(100,2)
y = randn(100)>0
model = LogisticRegressionCV(
solver="sag", scoring="neg_log_loss", verbose=0, n_jobs=-1, cv=10
)
model.fit(X=X, y=y)
model.score(X,y)

cross_val_score(model, X, y, cv=10, scoring="neg_log_loss")


With this code, it doesn't look anymore like it's a simple sign flip for the metric. The outputs are 0.59 for the score and array([-0.70578452, -0.68773683, -0.68627652, -0.69731349, -0.69198876, -0.70089103, -0.69476663, -0.68279466, -0.70066003, -0.68532253]) for the cross validation score.










share|improve this question















I am fitting this model from sklearn.



LogisticRegressionCV(
solver="sag", scoring="neg_log_loss", verbose=0, n_jobs=-1, cv=10
)


The fitting results in a model.score (on training set) of 0.67 and change. Since there is no way (or I don't know how) to access the results of the cross validation performed as part of the model fitting, I run as separate cross validation on the same model with



cross_val_score(model, X, y, cv=10, scoring="neg_log_loss")


This returns an array of negative numbers



[-0.69517214 -0.69211235 -0.64173978 -0.66429986 -0.77126878 -0.65127196
-0.66302393 -0.65916281 -0.66893633 -0.67605681]


which, if signs were flipped, would seem in a range compatible with the training score.
I've read the discussion in an issue about cross_val_score flipping the sign of the given scoring function and the solution seemed that neg_* metrics were being introduced to make such flipping unnecessary and I am using neg_log_loss. The issue talks about mse but the arguments seem to apply to log_loss as well. Is there a way to have cross_val_score return the same metric as specified in its arguments? Or is this a bug I should file? Or a misunderstanding on my part and sign change is still to be expected from cross_val_score?



I hope this is a specific enough question for SO. Sklearn devs redirect users to SO for questions that are not clear-cut bug reports or feature reqs.



Adding minimal repro code per request in comments (sklearn v 0.19.1 python 2.7):



from numpy.random import randn, seed
from sklearn.linear_model import LogisticRegressionCV
from sklearn.model_selection import cross_val_score

seed (0)
X = randn(100,2)
y = randn(100)>0
model = LogisticRegressionCV(
solver="sag", scoring="neg_log_loss", verbose=0, n_jobs=-1, cv=10
)
model.fit(X=X, y=y)
model.score(X,y)

cross_val_score(model, X, y, cv=10, scoring="neg_log_loss")


With this code, it doesn't look anymore like it's a simple sign flip for the metric. The outputs are 0.59 for the score and array([-0.70578452, -0.68773683, -0.68627652, -0.69731349, -0.69198876, -0.70089103, -0.69476663, -0.68279466, -0.70066003, -0.68532253]) for the cross validation score.







scikit-learn cross-validation loss-function






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 20 '18 at 20:09







piccolbo

















asked Nov 19 '18 at 19:48









piccolbopiccolbo

1,073515




1,073515












  • Can you show the complete code and possibly some data which reproduces positive score when model.score()? I am not able to duplicate it on scikit-learn inbuilt datasets.
    – Vivek Kumar
    Nov 20 '18 at 6:49










  • The complete code is at github.com/piccolbo/rightload branch basilica The ML code is in ml.py Sharing the data is more complex and running the code requires access to a web service. I need to think of something more self-contained for a more practical repro.
    – piccolbo
    Nov 20 '18 at 16:17












  • The code that generates the positive score is pretty trivial, in ml.py:127 and following lines. model.fit(X,y) followed by model.score(X,y), pretty much. I hope I got your question -- I still owe you some data for a complete repro, of course.
    – piccolbo
    Nov 20 '18 at 16:25










  • Got the repro but it requires sharing two pickles with data. Is there a SO preferred way of doing that?
    – piccolbo
    Nov 20 '18 at 19:11










  • Replaced repro with one that is self-contained and quick. Doesn't look like a simple sign flip anymore, though.
    – piccolbo
    Nov 20 '18 at 20:10




















  • Can you show the complete code and possibly some data which reproduces positive score when model.score()? I am not able to duplicate it on scikit-learn inbuilt datasets.
    – Vivek Kumar
    Nov 20 '18 at 6:49










  • The complete code is at github.com/piccolbo/rightload branch basilica The ML code is in ml.py Sharing the data is more complex and running the code requires access to a web service. I need to think of something more self-contained for a more practical repro.
    – piccolbo
    Nov 20 '18 at 16:17












  • The code that generates the positive score is pretty trivial, in ml.py:127 and following lines. model.fit(X,y) followed by model.score(X,y), pretty much. I hope I got your question -- I still owe you some data for a complete repro, of course.
    – piccolbo
    Nov 20 '18 at 16:25










  • Got the repro but it requires sharing two pickles with data. Is there a SO preferred way of doing that?
    – piccolbo
    Nov 20 '18 at 19:11










  • Replaced repro with one that is self-contained and quick. Doesn't look like a simple sign flip anymore, though.
    – piccolbo
    Nov 20 '18 at 20:10


















Can you show the complete code and possibly some data which reproduces positive score when model.score()? I am not able to duplicate it on scikit-learn inbuilt datasets.
– Vivek Kumar
Nov 20 '18 at 6:49




Can you show the complete code and possibly some data which reproduces positive score when model.score()? I am not able to duplicate it on scikit-learn inbuilt datasets.
– Vivek Kumar
Nov 20 '18 at 6:49












The complete code is at github.com/piccolbo/rightload branch basilica The ML code is in ml.py Sharing the data is more complex and running the code requires access to a web service. I need to think of something more self-contained for a more practical repro.
– piccolbo
Nov 20 '18 at 16:17






The complete code is at github.com/piccolbo/rightload branch basilica The ML code is in ml.py Sharing the data is more complex and running the code requires access to a web service. I need to think of something more self-contained for a more practical repro.
– piccolbo
Nov 20 '18 at 16:17














The code that generates the positive score is pretty trivial, in ml.py:127 and following lines. model.fit(X,y) followed by model.score(X,y), pretty much. I hope I got your question -- I still owe you some data for a complete repro, of course.
– piccolbo
Nov 20 '18 at 16:25




The code that generates the positive score is pretty trivial, in ml.py:127 and following lines. model.fit(X,y) followed by model.score(X,y), pretty much. I hope I got your question -- I still owe you some data for a complete repro, of course.
– piccolbo
Nov 20 '18 at 16:25












Got the repro but it requires sharing two pickles with data. Is there a SO preferred way of doing that?
– piccolbo
Nov 20 '18 at 19:11




Got the repro but it requires sharing two pickles with data. Is there a SO preferred way of doing that?
– piccolbo
Nov 20 '18 at 19:11












Replaced repro with one that is self-contained and quick. Doesn't look like a simple sign flip anymore, though.
– piccolbo
Nov 20 '18 at 20:10






Replaced repro with one that is self-contained and quick. Doesn't look like a simple sign flip anymore, though.
– piccolbo
Nov 20 '18 at 20:10














1 Answer
1






active

oldest

votes


















1














Note: edited after the fruitful comment thread with Vivek Kumar and piccolbo.



About LinearRegressionCV score method's strange results



You found a bug, which was fixed in version 0.20.0.



From the changelog:




Fix: Fixed a bug in linear_model.LogisticRegressionCV where the score method always computes accuracy, not the metric given by the scoring parameter. #10998 by Thomas Fan.




Also, sklearn's 0.19 LogisticRegressionCV documentation says:




score(X, y, sample_weight=None)



Returns the mean accuracy on the given test data and labels.




While from version 0.20.0, the docs are updated with the bugfix:




score(X, y, sample_weight=None)



Returns the score using the scoring option on the given test data and labels.






About the negative values returned in cross_val_score



cross_val_score flips the result value for error or loss metrics, while it preserves the sign for score metrics. From the documentation:




All scorer objects follow the convention that higher return values are better than lower return values. Thus metrics which measure the distance between the model and the data, like metrics.mean_squared_error, are available as neg_mean_squared_error which return the negated value of the metric.







share|improve this answer























  • I don't understand why I got a negative vote. I reduced the assertivity of my answer, in case that was the problem. I think it adds useful information on the topic, at least.
    – Julian Peller
    Nov 21 '18 at 4:59










  • Yes. You are correct. LogisticRegressionCV returns mean accuracy in version 0.19. From version 0.20 upwards, it returns the score for the defined scoring param.
    – Vivek Kumar
    Nov 21 '18 at 6:37










  • The problem is solved upgrading sklearn. This was suggested in a comment which seems to have disappeared, then in Julian's answer which contains many other things that IMHO are weakly related. If he could simplify it to the point of accuracy vs requested metric as changed in the latest sklearn version, I'd be glad to mark it as accepted. Thanks!
    – piccolbo
    Nov 22 '18 at 2:29










  • @piccolbo glad to hear it's solved! It was a really tricky scenario. I did some editions to the answer, removing some argumental turnarounds, giving relevant credits, keeping the information on the problem of accuracy (and showing detailed explicit citations on the matter) and also the information about the sign flip for cross_val_score (which is not trivial and seems somehow relevant too, at least for the first part of your question before the repro code). Does it look good? Any suggestion?
    – Julian Peller
    Nov 22 '18 at 2:58












  • Actually, I found the bugfix in the changelog!! It's LogisticRegressionCV.score doesn't respect scoring, inconsistent with GridSearchCV. Adding this to the answer.
    – Julian Peller
    Nov 22 '18 at 3:06











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53381650%2fwhy-does-cross-val-score-in-sklearn-flip-the-value-of-the-metric%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














Note: edited after the fruitful comment thread with Vivek Kumar and piccolbo.



About LinearRegressionCV score method's strange results



You found a bug, which was fixed in version 0.20.0.



From the changelog:




Fix: Fixed a bug in linear_model.LogisticRegressionCV where the score method always computes accuracy, not the metric given by the scoring parameter. #10998 by Thomas Fan.




Also, sklearn's 0.19 LogisticRegressionCV documentation says:




score(X, y, sample_weight=None)



Returns the mean accuracy on the given test data and labels.




While from version 0.20.0, the docs are updated with the bugfix:




score(X, y, sample_weight=None)



Returns the score using the scoring option on the given test data and labels.






About the negative values returned in cross_val_score



cross_val_score flips the result value for error or loss metrics, while it preserves the sign for score metrics. From the documentation:




All scorer objects follow the convention that higher return values are better than lower return values. Thus metrics which measure the distance between the model and the data, like metrics.mean_squared_error, are available as neg_mean_squared_error which return the negated value of the metric.







share|improve this answer























  • I don't understand why I got a negative vote. I reduced the assertivity of my answer, in case that was the problem. I think it adds useful information on the topic, at least.
    – Julian Peller
    Nov 21 '18 at 4:59










  • Yes. You are correct. LogisticRegressionCV returns mean accuracy in version 0.19. From version 0.20 upwards, it returns the score for the defined scoring param.
    – Vivek Kumar
    Nov 21 '18 at 6:37










  • The problem is solved upgrading sklearn. This was suggested in a comment which seems to have disappeared, then in Julian's answer which contains many other things that IMHO are weakly related. If he could simplify it to the point of accuracy vs requested metric as changed in the latest sklearn version, I'd be glad to mark it as accepted. Thanks!
    – piccolbo
    Nov 22 '18 at 2:29










  • @piccolbo glad to hear it's solved! It was a really tricky scenario. I did some editions to the answer, removing some argumental turnarounds, giving relevant credits, keeping the information on the problem of accuracy (and showing detailed explicit citations on the matter) and also the information about the sign flip for cross_val_score (which is not trivial and seems somehow relevant too, at least for the first part of your question before the repro code). Does it look good? Any suggestion?
    – Julian Peller
    Nov 22 '18 at 2:58












  • Actually, I found the bugfix in the changelog!! It's LogisticRegressionCV.score doesn't respect scoring, inconsistent with GridSearchCV. Adding this to the answer.
    – Julian Peller
    Nov 22 '18 at 3:06
















1














Note: edited after the fruitful comment thread with Vivek Kumar and piccolbo.



About LinearRegressionCV score method's strange results



You found a bug, which was fixed in version 0.20.0.



From the changelog:




Fix: Fixed a bug in linear_model.LogisticRegressionCV where the score method always computes accuracy, not the metric given by the scoring parameter. #10998 by Thomas Fan.




Also, sklearn's 0.19 LogisticRegressionCV documentation says:




score(X, y, sample_weight=None)



Returns the mean accuracy on the given test data and labels.




While from version 0.20.0, the docs are updated with the bugfix:




score(X, y, sample_weight=None)



Returns the score using the scoring option on the given test data and labels.






About the negative values returned in cross_val_score



cross_val_score flips the result value for error or loss metrics, while it preserves the sign for score metrics. From the documentation:




All scorer objects follow the convention that higher return values are better than lower return values. Thus metrics which measure the distance between the model and the data, like metrics.mean_squared_error, are available as neg_mean_squared_error which return the negated value of the metric.







share|improve this answer























  • I don't understand why I got a negative vote. I reduced the assertivity of my answer, in case that was the problem. I think it adds useful information on the topic, at least.
    – Julian Peller
    Nov 21 '18 at 4:59










  • Yes. You are correct. LogisticRegressionCV returns mean accuracy in version 0.19. From version 0.20 upwards, it returns the score for the defined scoring param.
    – Vivek Kumar
    Nov 21 '18 at 6:37










  • The problem is solved upgrading sklearn. This was suggested in a comment which seems to have disappeared, then in Julian's answer which contains many other things that IMHO are weakly related. If he could simplify it to the point of accuracy vs requested metric as changed in the latest sklearn version, I'd be glad to mark it as accepted. Thanks!
    – piccolbo
    Nov 22 '18 at 2:29










  • @piccolbo glad to hear it's solved! It was a really tricky scenario. I did some editions to the answer, removing some argumental turnarounds, giving relevant credits, keeping the information on the problem of accuracy (and showing detailed explicit citations on the matter) and also the information about the sign flip for cross_val_score (which is not trivial and seems somehow relevant too, at least for the first part of your question before the repro code). Does it look good? Any suggestion?
    – Julian Peller
    Nov 22 '18 at 2:58












  • Actually, I found the bugfix in the changelog!! It's LogisticRegressionCV.score doesn't respect scoring, inconsistent with GridSearchCV. Adding this to the answer.
    – Julian Peller
    Nov 22 '18 at 3:06














1












1








1






Note: edited after the fruitful comment thread with Vivek Kumar and piccolbo.



About LinearRegressionCV score method's strange results



You found a bug, which was fixed in version 0.20.0.



From the changelog:




Fix: Fixed a bug in linear_model.LogisticRegressionCV where the score method always computes accuracy, not the metric given by the scoring parameter. #10998 by Thomas Fan.




Also, sklearn's 0.19 LogisticRegressionCV documentation says:




score(X, y, sample_weight=None)



Returns the mean accuracy on the given test data and labels.




While from version 0.20.0, the docs are updated with the bugfix:




score(X, y, sample_weight=None)



Returns the score using the scoring option on the given test data and labels.






About the negative values returned in cross_val_score



cross_val_score flips the result value for error or loss metrics, while it preserves the sign for score metrics. From the documentation:




All scorer objects follow the convention that higher return values are better than lower return values. Thus metrics which measure the distance between the model and the data, like metrics.mean_squared_error, are available as neg_mean_squared_error which return the negated value of the metric.







share|improve this answer














Note: edited after the fruitful comment thread with Vivek Kumar and piccolbo.



About LinearRegressionCV score method's strange results



You found a bug, which was fixed in version 0.20.0.



From the changelog:




Fix: Fixed a bug in linear_model.LogisticRegressionCV where the score method always computes accuracy, not the metric given by the scoring parameter. #10998 by Thomas Fan.




Also, sklearn's 0.19 LogisticRegressionCV documentation says:




score(X, y, sample_weight=None)



Returns the mean accuracy on the given test data and labels.




While from version 0.20.0, the docs are updated with the bugfix:




score(X, y, sample_weight=None)



Returns the score using the scoring option on the given test data and labels.






About the negative values returned in cross_val_score



cross_val_score flips the result value for error or loss metrics, while it preserves the sign for score metrics. From the documentation:




All scorer objects follow the convention that higher return values are better than lower return values. Thus metrics which measure the distance between the model and the data, like metrics.mean_squared_error, are available as neg_mean_squared_error which return the negated value of the metric.








share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 22 '18 at 3:10

























answered Nov 20 '18 at 20:13









Julian PellerJulian Peller

864511




864511












  • I don't understand why I got a negative vote. I reduced the assertivity of my answer, in case that was the problem. I think it adds useful information on the topic, at least.
    – Julian Peller
    Nov 21 '18 at 4:59










  • Yes. You are correct. LogisticRegressionCV returns mean accuracy in version 0.19. From version 0.20 upwards, it returns the score for the defined scoring param.
    – Vivek Kumar
    Nov 21 '18 at 6:37










  • The problem is solved upgrading sklearn. This was suggested in a comment which seems to have disappeared, then in Julian's answer which contains many other things that IMHO are weakly related. If he could simplify it to the point of accuracy vs requested metric as changed in the latest sklearn version, I'd be glad to mark it as accepted. Thanks!
    – piccolbo
    Nov 22 '18 at 2:29










  • @piccolbo glad to hear it's solved! It was a really tricky scenario. I did some editions to the answer, removing some argumental turnarounds, giving relevant credits, keeping the information on the problem of accuracy (and showing detailed explicit citations on the matter) and also the information about the sign flip for cross_val_score (which is not trivial and seems somehow relevant too, at least for the first part of your question before the repro code). Does it look good? Any suggestion?
    – Julian Peller
    Nov 22 '18 at 2:58












  • Actually, I found the bugfix in the changelog!! It's LogisticRegressionCV.score doesn't respect scoring, inconsistent with GridSearchCV. Adding this to the answer.
    – Julian Peller
    Nov 22 '18 at 3:06


















  • I don't understand why I got a negative vote. I reduced the assertivity of my answer, in case that was the problem. I think it adds useful information on the topic, at least.
    – Julian Peller
    Nov 21 '18 at 4:59










  • Yes. You are correct. LogisticRegressionCV returns mean accuracy in version 0.19. From version 0.20 upwards, it returns the score for the defined scoring param.
    – Vivek Kumar
    Nov 21 '18 at 6:37










  • The problem is solved upgrading sklearn. This was suggested in a comment which seems to have disappeared, then in Julian's answer which contains many other things that IMHO are weakly related. If he could simplify it to the point of accuracy vs requested metric as changed in the latest sklearn version, I'd be glad to mark it as accepted. Thanks!
    – piccolbo
    Nov 22 '18 at 2:29










  • @piccolbo glad to hear it's solved! It was a really tricky scenario. I did some editions to the answer, removing some argumental turnarounds, giving relevant credits, keeping the information on the problem of accuracy (and showing detailed explicit citations on the matter) and also the information about the sign flip for cross_val_score (which is not trivial and seems somehow relevant too, at least for the first part of your question before the repro code). Does it look good? Any suggestion?
    – Julian Peller
    Nov 22 '18 at 2:58












  • Actually, I found the bugfix in the changelog!! It's LogisticRegressionCV.score doesn't respect scoring, inconsistent with GridSearchCV. Adding this to the answer.
    – Julian Peller
    Nov 22 '18 at 3:06
















I don't understand why I got a negative vote. I reduced the assertivity of my answer, in case that was the problem. I think it adds useful information on the topic, at least.
– Julian Peller
Nov 21 '18 at 4:59




I don't understand why I got a negative vote. I reduced the assertivity of my answer, in case that was the problem. I think it adds useful information on the topic, at least.
– Julian Peller
Nov 21 '18 at 4:59












Yes. You are correct. LogisticRegressionCV returns mean accuracy in version 0.19. From version 0.20 upwards, it returns the score for the defined scoring param.
– Vivek Kumar
Nov 21 '18 at 6:37




Yes. You are correct. LogisticRegressionCV returns mean accuracy in version 0.19. From version 0.20 upwards, it returns the score for the defined scoring param.
– Vivek Kumar
Nov 21 '18 at 6:37












The problem is solved upgrading sklearn. This was suggested in a comment which seems to have disappeared, then in Julian's answer which contains many other things that IMHO are weakly related. If he could simplify it to the point of accuracy vs requested metric as changed in the latest sklearn version, I'd be glad to mark it as accepted. Thanks!
– piccolbo
Nov 22 '18 at 2:29




The problem is solved upgrading sklearn. This was suggested in a comment which seems to have disappeared, then in Julian's answer which contains many other things that IMHO are weakly related. If he could simplify it to the point of accuracy vs requested metric as changed in the latest sklearn version, I'd be glad to mark it as accepted. Thanks!
– piccolbo
Nov 22 '18 at 2:29












@piccolbo glad to hear it's solved! It was a really tricky scenario. I did some editions to the answer, removing some argumental turnarounds, giving relevant credits, keeping the information on the problem of accuracy (and showing detailed explicit citations on the matter) and also the information about the sign flip for cross_val_score (which is not trivial and seems somehow relevant too, at least for the first part of your question before the repro code). Does it look good? Any suggestion?
– Julian Peller
Nov 22 '18 at 2:58






@piccolbo glad to hear it's solved! It was a really tricky scenario. I did some editions to the answer, removing some argumental turnarounds, giving relevant credits, keeping the information on the problem of accuracy (and showing detailed explicit citations on the matter) and also the information about the sign flip for cross_val_score (which is not trivial and seems somehow relevant too, at least for the first part of your question before the repro code). Does it look good? Any suggestion?
– Julian Peller
Nov 22 '18 at 2:58














Actually, I found the bugfix in the changelog!! It's LogisticRegressionCV.score doesn't respect scoring, inconsistent with GridSearchCV. Adding this to the answer.
– Julian Peller
Nov 22 '18 at 3:06




Actually, I found the bugfix in the changelog!! It's LogisticRegressionCV.score doesn't respect scoring, inconsistent with GridSearchCV. Adding this to the answer.
– Julian Peller
Nov 22 '18 at 3:06


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53381650%2fwhy-does-cross-val-score-in-sklearn-flip-the-value-of-the-metric%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Can a sorcerer learn a 5th-level spell early by creating spell slots using the Font of Magic feature?

ts Property 'filter' does not exist on type '{}'

mat-slide-toggle shouldn't change it's state when I click cancel in confirmation window