Why does cross_val_score in sklearn flip the value of the metric?
I am fitting this model from sklearn
.
LogisticRegressionCV(
solver="sag", scoring="neg_log_loss", verbose=0, n_jobs=-1, cv=10
)
The fitting results in a model.score
(on training set) of 0.67 and change. Since there is no way (or I don't know how) to access the results of the cross validation performed as part of the model fitting, I run as separate cross validation on the same model with
cross_val_score(model, X, y, cv=10, scoring="neg_log_loss")
This returns an array of negative numbers
[-0.69517214 -0.69211235 -0.64173978 -0.66429986 -0.77126878 -0.65127196
-0.66302393 -0.65916281 -0.66893633 -0.67605681]
which, if signs were flipped, would seem in a range compatible with the training score.
I've read the discussion in an issue about cross_val_score flipping the sign of the given scoring function and the solution seemed that neg_*
metrics were being introduced to make such flipping unnecessary and I am using neg_log_loss
. The issue talks about mse
but the arguments seem to apply to log_loss
as well. Is there a way to have cross_val_score
return the same metric as specified in its arguments? Or is this a bug I should file? Or a misunderstanding on my part and sign change is still to be expected from cross_val_score
?
I hope this is a specific enough question for SO. Sklearn
devs redirect users to SO for questions that are not clear-cut bug reports or feature reqs.
Adding minimal repro code per request in comments (sklearn v 0.19.1 python 2.7):
from numpy.random import randn, seed
from sklearn.linear_model import LogisticRegressionCV
from sklearn.model_selection import cross_val_score
seed (0)
X = randn(100,2)
y = randn(100)>0
model = LogisticRegressionCV(
solver="sag", scoring="neg_log_loss", verbose=0, n_jobs=-1, cv=10
)
model.fit(X=X, y=y)
model.score(X,y)
cross_val_score(model, X, y, cv=10, scoring="neg_log_loss")
With this code, it doesn't look anymore like it's a simple sign flip for the metric. The outputs are 0.59 for the score and array([-0.70578452, -0.68773683, -0.68627652, -0.69731349, -0.69198876, -0.70089103, -0.69476663, -0.68279466, -0.70066003, -0.68532253])
for the cross validation score.
scikit-learn cross-validation loss-function
|
show 1 more comment
I am fitting this model from sklearn
.
LogisticRegressionCV(
solver="sag", scoring="neg_log_loss", verbose=0, n_jobs=-1, cv=10
)
The fitting results in a model.score
(on training set) of 0.67 and change. Since there is no way (or I don't know how) to access the results of the cross validation performed as part of the model fitting, I run as separate cross validation on the same model with
cross_val_score(model, X, y, cv=10, scoring="neg_log_loss")
This returns an array of negative numbers
[-0.69517214 -0.69211235 -0.64173978 -0.66429986 -0.77126878 -0.65127196
-0.66302393 -0.65916281 -0.66893633 -0.67605681]
which, if signs were flipped, would seem in a range compatible with the training score.
I've read the discussion in an issue about cross_val_score flipping the sign of the given scoring function and the solution seemed that neg_*
metrics were being introduced to make such flipping unnecessary and I am using neg_log_loss
. The issue talks about mse
but the arguments seem to apply to log_loss
as well. Is there a way to have cross_val_score
return the same metric as specified in its arguments? Or is this a bug I should file? Or a misunderstanding on my part and sign change is still to be expected from cross_val_score
?
I hope this is a specific enough question for SO. Sklearn
devs redirect users to SO for questions that are not clear-cut bug reports or feature reqs.
Adding minimal repro code per request in comments (sklearn v 0.19.1 python 2.7):
from numpy.random import randn, seed
from sklearn.linear_model import LogisticRegressionCV
from sklearn.model_selection import cross_val_score
seed (0)
X = randn(100,2)
y = randn(100)>0
model = LogisticRegressionCV(
solver="sag", scoring="neg_log_loss", verbose=0, n_jobs=-1, cv=10
)
model.fit(X=X, y=y)
model.score(X,y)
cross_val_score(model, X, y, cv=10, scoring="neg_log_loss")
With this code, it doesn't look anymore like it's a simple sign flip for the metric. The outputs are 0.59 for the score and array([-0.70578452, -0.68773683, -0.68627652, -0.69731349, -0.69198876, -0.70089103, -0.69476663, -0.68279466, -0.70066003, -0.68532253])
for the cross validation score.
scikit-learn cross-validation loss-function
Can you show the complete code and possibly some data which reproduces positive score whenmodel.score()
? I am not able to duplicate it on scikit-learn inbuilt datasets.
– Vivek Kumar
Nov 20 '18 at 6:49
The complete code is at github.com/piccolbo/rightload branch basilica The ML code is in ml.py Sharing the data is more complex and running the code requires access to a web service. I need to think of something more self-contained for a more practical repro.
– piccolbo
Nov 20 '18 at 16:17
The code that generates the positive score is pretty trivial, in ml.py:127 and following lines. model.fit(X,y) followed by model.score(X,y), pretty much. I hope I got your question -- I still owe you some data for a complete repro, of course.
– piccolbo
Nov 20 '18 at 16:25
Got the repro but it requires sharing two pickles with data. Is there a SO preferred way of doing that?
– piccolbo
Nov 20 '18 at 19:11
Replaced repro with one that is self-contained and quick. Doesn't look like a simple sign flip anymore, though.
– piccolbo
Nov 20 '18 at 20:10
|
show 1 more comment
I am fitting this model from sklearn
.
LogisticRegressionCV(
solver="sag", scoring="neg_log_loss", verbose=0, n_jobs=-1, cv=10
)
The fitting results in a model.score
(on training set) of 0.67 and change. Since there is no way (or I don't know how) to access the results of the cross validation performed as part of the model fitting, I run as separate cross validation on the same model with
cross_val_score(model, X, y, cv=10, scoring="neg_log_loss")
This returns an array of negative numbers
[-0.69517214 -0.69211235 -0.64173978 -0.66429986 -0.77126878 -0.65127196
-0.66302393 -0.65916281 -0.66893633 -0.67605681]
which, if signs were flipped, would seem in a range compatible with the training score.
I've read the discussion in an issue about cross_val_score flipping the sign of the given scoring function and the solution seemed that neg_*
metrics were being introduced to make such flipping unnecessary and I am using neg_log_loss
. The issue talks about mse
but the arguments seem to apply to log_loss
as well. Is there a way to have cross_val_score
return the same metric as specified in its arguments? Or is this a bug I should file? Or a misunderstanding on my part and sign change is still to be expected from cross_val_score
?
I hope this is a specific enough question for SO. Sklearn
devs redirect users to SO for questions that are not clear-cut bug reports or feature reqs.
Adding minimal repro code per request in comments (sklearn v 0.19.1 python 2.7):
from numpy.random import randn, seed
from sklearn.linear_model import LogisticRegressionCV
from sklearn.model_selection import cross_val_score
seed (0)
X = randn(100,2)
y = randn(100)>0
model = LogisticRegressionCV(
solver="sag", scoring="neg_log_loss", verbose=0, n_jobs=-1, cv=10
)
model.fit(X=X, y=y)
model.score(X,y)
cross_val_score(model, X, y, cv=10, scoring="neg_log_loss")
With this code, it doesn't look anymore like it's a simple sign flip for the metric. The outputs are 0.59 for the score and array([-0.70578452, -0.68773683, -0.68627652, -0.69731349, -0.69198876, -0.70089103, -0.69476663, -0.68279466, -0.70066003, -0.68532253])
for the cross validation score.
scikit-learn cross-validation loss-function
I am fitting this model from sklearn
.
LogisticRegressionCV(
solver="sag", scoring="neg_log_loss", verbose=0, n_jobs=-1, cv=10
)
The fitting results in a model.score
(on training set) of 0.67 and change. Since there is no way (or I don't know how) to access the results of the cross validation performed as part of the model fitting, I run as separate cross validation on the same model with
cross_val_score(model, X, y, cv=10, scoring="neg_log_loss")
This returns an array of negative numbers
[-0.69517214 -0.69211235 -0.64173978 -0.66429986 -0.77126878 -0.65127196
-0.66302393 -0.65916281 -0.66893633 -0.67605681]
which, if signs were flipped, would seem in a range compatible with the training score.
I've read the discussion in an issue about cross_val_score flipping the sign of the given scoring function and the solution seemed that neg_*
metrics were being introduced to make such flipping unnecessary and I am using neg_log_loss
. The issue talks about mse
but the arguments seem to apply to log_loss
as well. Is there a way to have cross_val_score
return the same metric as specified in its arguments? Or is this a bug I should file? Or a misunderstanding on my part and sign change is still to be expected from cross_val_score
?
I hope this is a specific enough question for SO. Sklearn
devs redirect users to SO for questions that are not clear-cut bug reports or feature reqs.
Adding minimal repro code per request in comments (sklearn v 0.19.1 python 2.7):
from numpy.random import randn, seed
from sklearn.linear_model import LogisticRegressionCV
from sklearn.model_selection import cross_val_score
seed (0)
X = randn(100,2)
y = randn(100)>0
model = LogisticRegressionCV(
solver="sag", scoring="neg_log_loss", verbose=0, n_jobs=-1, cv=10
)
model.fit(X=X, y=y)
model.score(X,y)
cross_val_score(model, X, y, cv=10, scoring="neg_log_loss")
With this code, it doesn't look anymore like it's a simple sign flip for the metric. The outputs are 0.59 for the score and array([-0.70578452, -0.68773683, -0.68627652, -0.69731349, -0.69198876, -0.70089103, -0.69476663, -0.68279466, -0.70066003, -0.68532253])
for the cross validation score.
scikit-learn cross-validation loss-function
scikit-learn cross-validation loss-function
edited Nov 20 '18 at 20:09
piccolbo
asked Nov 19 '18 at 19:48
piccolbopiccolbo
1,073515
1,073515
Can you show the complete code and possibly some data which reproduces positive score whenmodel.score()
? I am not able to duplicate it on scikit-learn inbuilt datasets.
– Vivek Kumar
Nov 20 '18 at 6:49
The complete code is at github.com/piccolbo/rightload branch basilica The ML code is in ml.py Sharing the data is more complex and running the code requires access to a web service. I need to think of something more self-contained for a more practical repro.
– piccolbo
Nov 20 '18 at 16:17
The code that generates the positive score is pretty trivial, in ml.py:127 and following lines. model.fit(X,y) followed by model.score(X,y), pretty much. I hope I got your question -- I still owe you some data for a complete repro, of course.
– piccolbo
Nov 20 '18 at 16:25
Got the repro but it requires sharing two pickles with data. Is there a SO preferred way of doing that?
– piccolbo
Nov 20 '18 at 19:11
Replaced repro with one that is self-contained and quick. Doesn't look like a simple sign flip anymore, though.
– piccolbo
Nov 20 '18 at 20:10
|
show 1 more comment
Can you show the complete code and possibly some data which reproduces positive score whenmodel.score()
? I am not able to duplicate it on scikit-learn inbuilt datasets.
– Vivek Kumar
Nov 20 '18 at 6:49
The complete code is at github.com/piccolbo/rightload branch basilica The ML code is in ml.py Sharing the data is more complex and running the code requires access to a web service. I need to think of something more self-contained for a more practical repro.
– piccolbo
Nov 20 '18 at 16:17
The code that generates the positive score is pretty trivial, in ml.py:127 and following lines. model.fit(X,y) followed by model.score(X,y), pretty much. I hope I got your question -- I still owe you some data for a complete repro, of course.
– piccolbo
Nov 20 '18 at 16:25
Got the repro but it requires sharing two pickles with data. Is there a SO preferred way of doing that?
– piccolbo
Nov 20 '18 at 19:11
Replaced repro with one that is self-contained and quick. Doesn't look like a simple sign flip anymore, though.
– piccolbo
Nov 20 '18 at 20:10
Can you show the complete code and possibly some data which reproduces positive score when
model.score()
? I am not able to duplicate it on scikit-learn inbuilt datasets.– Vivek Kumar
Nov 20 '18 at 6:49
Can you show the complete code and possibly some data which reproduces positive score when
model.score()
? I am not able to duplicate it on scikit-learn inbuilt datasets.– Vivek Kumar
Nov 20 '18 at 6:49
The complete code is at github.com/piccolbo/rightload branch basilica The ML code is in ml.py Sharing the data is more complex and running the code requires access to a web service. I need to think of something more self-contained for a more practical repro.
– piccolbo
Nov 20 '18 at 16:17
The complete code is at github.com/piccolbo/rightload branch basilica The ML code is in ml.py Sharing the data is more complex and running the code requires access to a web service. I need to think of something more self-contained for a more practical repro.
– piccolbo
Nov 20 '18 at 16:17
The code that generates the positive score is pretty trivial, in ml.py:127 and following lines. model.fit(X,y) followed by model.score(X,y), pretty much. I hope I got your question -- I still owe you some data for a complete repro, of course.
– piccolbo
Nov 20 '18 at 16:25
The code that generates the positive score is pretty trivial, in ml.py:127 and following lines. model.fit(X,y) followed by model.score(X,y), pretty much. I hope I got your question -- I still owe you some data for a complete repro, of course.
– piccolbo
Nov 20 '18 at 16:25
Got the repro but it requires sharing two pickles with data. Is there a SO preferred way of doing that?
– piccolbo
Nov 20 '18 at 19:11
Got the repro but it requires sharing two pickles with data. Is there a SO preferred way of doing that?
– piccolbo
Nov 20 '18 at 19:11
Replaced repro with one that is self-contained and quick. Doesn't look like a simple sign flip anymore, though.
– piccolbo
Nov 20 '18 at 20:10
Replaced repro with one that is self-contained and quick. Doesn't look like a simple sign flip anymore, though.
– piccolbo
Nov 20 '18 at 20:10
|
show 1 more comment
1 Answer
1
active
oldest
votes
Note: edited after the fruitful comment thread with Vivek Kumar and piccolbo.
About LinearRegressionCV score
method's strange results
You found a bug, which was fixed in version 0.20.0
.
From the changelog:
Fix: Fixed a bug in linear_model.LogisticRegressionCV where the score method always computes accuracy, not the metric given by the scoring parameter. #10998 by Thomas Fan.
Also, sklearn's 0.19 LogisticRegressionCV documentation says:
score(X, y, sample_weight=None)
Returns the mean accuracy on the given test data and labels.
While from version 0.20.0
, the docs are updated with the bugfix:
score(X, y, sample_weight=None)
Returns the score using the scoring option on the given test data and labels.
About the negative values returned in cross_val_score
cross_val_score
flips the result value for error
or loss
metrics, while it preserves the sign for score
metrics. From the documentation:
All scorer objects follow the convention that higher return values are better than lower return values. Thus metrics which measure the distance between the model and the data, like metrics.mean_squared_error, are available as neg_mean_squared_error which return the negated value of the metric.
I don't understand why I got a negative vote. I reduced the assertivity of my answer, in case that was the problem. I think it adds useful information on the topic, at least.
– Julian Peller
Nov 21 '18 at 4:59
Yes. You are correct. LogisticRegressionCV returns mean accuracy in version 0.19. From version 0.20 upwards, it returns the score for the definedscoring
param.
– Vivek Kumar
Nov 21 '18 at 6:37
The problem is solved upgrading sklearn. This was suggested in a comment which seems to have disappeared, then in Julian's answer which contains many other things that IMHO are weakly related. If he could simplify it to the point of accuracy vs requested metric as changed in the latest sklearn version, I'd be glad to mark it as accepted. Thanks!
– piccolbo
Nov 22 '18 at 2:29
@piccolbo glad to hear it's solved! It was a really tricky scenario. I did some editions to the answer, removing some argumental turnarounds, giving relevant credits, keeping the information on the problem of accuracy (and showing detailed explicit citations on the matter) and also the information about the sign flip forcross_val_score
(which is not trivial and seems somehow relevant too, at least for the first part of your question before the repro code). Does it look good? Any suggestion?
– Julian Peller
Nov 22 '18 at 2:58
Actually, I found the bugfix in the changelog!! It's LogisticRegressionCV.score doesn't respect scoring, inconsistent with GridSearchCV. Adding this to the answer.
– Julian Peller
Nov 22 '18 at 3:06
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53381650%2fwhy-does-cross-val-score-in-sklearn-flip-the-value-of-the-metric%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Note: edited after the fruitful comment thread with Vivek Kumar and piccolbo.
About LinearRegressionCV score
method's strange results
You found a bug, which was fixed in version 0.20.0
.
From the changelog:
Fix: Fixed a bug in linear_model.LogisticRegressionCV where the score method always computes accuracy, not the metric given by the scoring parameter. #10998 by Thomas Fan.
Also, sklearn's 0.19 LogisticRegressionCV documentation says:
score(X, y, sample_weight=None)
Returns the mean accuracy on the given test data and labels.
While from version 0.20.0
, the docs are updated with the bugfix:
score(X, y, sample_weight=None)
Returns the score using the scoring option on the given test data and labels.
About the negative values returned in cross_val_score
cross_val_score
flips the result value for error
or loss
metrics, while it preserves the sign for score
metrics. From the documentation:
All scorer objects follow the convention that higher return values are better than lower return values. Thus metrics which measure the distance between the model and the data, like metrics.mean_squared_error, are available as neg_mean_squared_error which return the negated value of the metric.
I don't understand why I got a negative vote. I reduced the assertivity of my answer, in case that was the problem. I think it adds useful information on the topic, at least.
– Julian Peller
Nov 21 '18 at 4:59
Yes. You are correct. LogisticRegressionCV returns mean accuracy in version 0.19. From version 0.20 upwards, it returns the score for the definedscoring
param.
– Vivek Kumar
Nov 21 '18 at 6:37
The problem is solved upgrading sklearn. This was suggested in a comment which seems to have disappeared, then in Julian's answer which contains many other things that IMHO are weakly related. If he could simplify it to the point of accuracy vs requested metric as changed in the latest sklearn version, I'd be glad to mark it as accepted. Thanks!
– piccolbo
Nov 22 '18 at 2:29
@piccolbo glad to hear it's solved! It was a really tricky scenario. I did some editions to the answer, removing some argumental turnarounds, giving relevant credits, keeping the information on the problem of accuracy (and showing detailed explicit citations on the matter) and also the information about the sign flip forcross_val_score
(which is not trivial and seems somehow relevant too, at least for the first part of your question before the repro code). Does it look good? Any suggestion?
– Julian Peller
Nov 22 '18 at 2:58
Actually, I found the bugfix in the changelog!! It's LogisticRegressionCV.score doesn't respect scoring, inconsistent with GridSearchCV. Adding this to the answer.
– Julian Peller
Nov 22 '18 at 3:06
add a comment |
Note: edited after the fruitful comment thread with Vivek Kumar and piccolbo.
About LinearRegressionCV score
method's strange results
You found a bug, which was fixed in version 0.20.0
.
From the changelog:
Fix: Fixed a bug in linear_model.LogisticRegressionCV where the score method always computes accuracy, not the metric given by the scoring parameter. #10998 by Thomas Fan.
Also, sklearn's 0.19 LogisticRegressionCV documentation says:
score(X, y, sample_weight=None)
Returns the mean accuracy on the given test data and labels.
While from version 0.20.0
, the docs are updated with the bugfix:
score(X, y, sample_weight=None)
Returns the score using the scoring option on the given test data and labels.
About the negative values returned in cross_val_score
cross_val_score
flips the result value for error
or loss
metrics, while it preserves the sign for score
metrics. From the documentation:
All scorer objects follow the convention that higher return values are better than lower return values. Thus metrics which measure the distance between the model and the data, like metrics.mean_squared_error, are available as neg_mean_squared_error which return the negated value of the metric.
I don't understand why I got a negative vote. I reduced the assertivity of my answer, in case that was the problem. I think it adds useful information on the topic, at least.
– Julian Peller
Nov 21 '18 at 4:59
Yes. You are correct. LogisticRegressionCV returns mean accuracy in version 0.19. From version 0.20 upwards, it returns the score for the definedscoring
param.
– Vivek Kumar
Nov 21 '18 at 6:37
The problem is solved upgrading sklearn. This was suggested in a comment which seems to have disappeared, then in Julian's answer which contains many other things that IMHO are weakly related. If he could simplify it to the point of accuracy vs requested metric as changed in the latest sklearn version, I'd be glad to mark it as accepted. Thanks!
– piccolbo
Nov 22 '18 at 2:29
@piccolbo glad to hear it's solved! It was a really tricky scenario. I did some editions to the answer, removing some argumental turnarounds, giving relevant credits, keeping the information on the problem of accuracy (and showing detailed explicit citations on the matter) and also the information about the sign flip forcross_val_score
(which is not trivial and seems somehow relevant too, at least for the first part of your question before the repro code). Does it look good? Any suggestion?
– Julian Peller
Nov 22 '18 at 2:58
Actually, I found the bugfix in the changelog!! It's LogisticRegressionCV.score doesn't respect scoring, inconsistent with GridSearchCV. Adding this to the answer.
– Julian Peller
Nov 22 '18 at 3:06
add a comment |
Note: edited after the fruitful comment thread with Vivek Kumar and piccolbo.
About LinearRegressionCV score
method's strange results
You found a bug, which was fixed in version 0.20.0
.
From the changelog:
Fix: Fixed a bug in linear_model.LogisticRegressionCV where the score method always computes accuracy, not the metric given by the scoring parameter. #10998 by Thomas Fan.
Also, sklearn's 0.19 LogisticRegressionCV documentation says:
score(X, y, sample_weight=None)
Returns the mean accuracy on the given test data and labels.
While from version 0.20.0
, the docs are updated with the bugfix:
score(X, y, sample_weight=None)
Returns the score using the scoring option on the given test data and labels.
About the negative values returned in cross_val_score
cross_val_score
flips the result value for error
or loss
metrics, while it preserves the sign for score
metrics. From the documentation:
All scorer objects follow the convention that higher return values are better than lower return values. Thus metrics which measure the distance between the model and the data, like metrics.mean_squared_error, are available as neg_mean_squared_error which return the negated value of the metric.
Note: edited after the fruitful comment thread with Vivek Kumar and piccolbo.
About LinearRegressionCV score
method's strange results
You found a bug, which was fixed in version 0.20.0
.
From the changelog:
Fix: Fixed a bug in linear_model.LogisticRegressionCV where the score method always computes accuracy, not the metric given by the scoring parameter. #10998 by Thomas Fan.
Also, sklearn's 0.19 LogisticRegressionCV documentation says:
score(X, y, sample_weight=None)
Returns the mean accuracy on the given test data and labels.
While from version 0.20.0
, the docs are updated with the bugfix:
score(X, y, sample_weight=None)
Returns the score using the scoring option on the given test data and labels.
About the negative values returned in cross_val_score
cross_val_score
flips the result value for error
or loss
metrics, while it preserves the sign for score
metrics. From the documentation:
All scorer objects follow the convention that higher return values are better than lower return values. Thus metrics which measure the distance between the model and the data, like metrics.mean_squared_error, are available as neg_mean_squared_error which return the negated value of the metric.
edited Nov 22 '18 at 3:10
answered Nov 20 '18 at 20:13
Julian PellerJulian Peller
864511
864511
I don't understand why I got a negative vote. I reduced the assertivity of my answer, in case that was the problem. I think it adds useful information on the topic, at least.
– Julian Peller
Nov 21 '18 at 4:59
Yes. You are correct. LogisticRegressionCV returns mean accuracy in version 0.19. From version 0.20 upwards, it returns the score for the definedscoring
param.
– Vivek Kumar
Nov 21 '18 at 6:37
The problem is solved upgrading sklearn. This was suggested in a comment which seems to have disappeared, then in Julian's answer which contains many other things that IMHO are weakly related. If he could simplify it to the point of accuracy vs requested metric as changed in the latest sklearn version, I'd be glad to mark it as accepted. Thanks!
– piccolbo
Nov 22 '18 at 2:29
@piccolbo glad to hear it's solved! It was a really tricky scenario. I did some editions to the answer, removing some argumental turnarounds, giving relevant credits, keeping the information on the problem of accuracy (and showing detailed explicit citations on the matter) and also the information about the sign flip forcross_val_score
(which is not trivial and seems somehow relevant too, at least for the first part of your question before the repro code). Does it look good? Any suggestion?
– Julian Peller
Nov 22 '18 at 2:58
Actually, I found the bugfix in the changelog!! It's LogisticRegressionCV.score doesn't respect scoring, inconsistent with GridSearchCV. Adding this to the answer.
– Julian Peller
Nov 22 '18 at 3:06
add a comment |
I don't understand why I got a negative vote. I reduced the assertivity of my answer, in case that was the problem. I think it adds useful information on the topic, at least.
– Julian Peller
Nov 21 '18 at 4:59
Yes. You are correct. LogisticRegressionCV returns mean accuracy in version 0.19. From version 0.20 upwards, it returns the score for the definedscoring
param.
– Vivek Kumar
Nov 21 '18 at 6:37
The problem is solved upgrading sklearn. This was suggested in a comment which seems to have disappeared, then in Julian's answer which contains many other things that IMHO are weakly related. If he could simplify it to the point of accuracy vs requested metric as changed in the latest sklearn version, I'd be glad to mark it as accepted. Thanks!
– piccolbo
Nov 22 '18 at 2:29
@piccolbo glad to hear it's solved! It was a really tricky scenario. I did some editions to the answer, removing some argumental turnarounds, giving relevant credits, keeping the information on the problem of accuracy (and showing detailed explicit citations on the matter) and also the information about the sign flip forcross_val_score
(which is not trivial and seems somehow relevant too, at least for the first part of your question before the repro code). Does it look good? Any suggestion?
– Julian Peller
Nov 22 '18 at 2:58
Actually, I found the bugfix in the changelog!! It's LogisticRegressionCV.score doesn't respect scoring, inconsistent with GridSearchCV. Adding this to the answer.
– Julian Peller
Nov 22 '18 at 3:06
I don't understand why I got a negative vote. I reduced the assertivity of my answer, in case that was the problem. I think it adds useful information on the topic, at least.
– Julian Peller
Nov 21 '18 at 4:59
I don't understand why I got a negative vote. I reduced the assertivity of my answer, in case that was the problem. I think it adds useful information on the topic, at least.
– Julian Peller
Nov 21 '18 at 4:59
Yes. You are correct. LogisticRegressionCV returns mean accuracy in version 0.19. From version 0.20 upwards, it returns the score for the defined
scoring
param.– Vivek Kumar
Nov 21 '18 at 6:37
Yes. You are correct. LogisticRegressionCV returns mean accuracy in version 0.19. From version 0.20 upwards, it returns the score for the defined
scoring
param.– Vivek Kumar
Nov 21 '18 at 6:37
The problem is solved upgrading sklearn. This was suggested in a comment which seems to have disappeared, then in Julian's answer which contains many other things that IMHO are weakly related. If he could simplify it to the point of accuracy vs requested metric as changed in the latest sklearn version, I'd be glad to mark it as accepted. Thanks!
– piccolbo
Nov 22 '18 at 2:29
The problem is solved upgrading sklearn. This was suggested in a comment which seems to have disappeared, then in Julian's answer which contains many other things that IMHO are weakly related. If he could simplify it to the point of accuracy vs requested metric as changed in the latest sklearn version, I'd be glad to mark it as accepted. Thanks!
– piccolbo
Nov 22 '18 at 2:29
@piccolbo glad to hear it's solved! It was a really tricky scenario. I did some editions to the answer, removing some argumental turnarounds, giving relevant credits, keeping the information on the problem of accuracy (and showing detailed explicit citations on the matter) and also the information about the sign flip for
cross_val_score
(which is not trivial and seems somehow relevant too, at least for the first part of your question before the repro code). Does it look good? Any suggestion?– Julian Peller
Nov 22 '18 at 2:58
@piccolbo glad to hear it's solved! It was a really tricky scenario. I did some editions to the answer, removing some argumental turnarounds, giving relevant credits, keeping the information on the problem of accuracy (and showing detailed explicit citations on the matter) and also the information about the sign flip for
cross_val_score
(which is not trivial and seems somehow relevant too, at least for the first part of your question before the repro code). Does it look good? Any suggestion?– Julian Peller
Nov 22 '18 at 2:58
Actually, I found the bugfix in the changelog!! It's LogisticRegressionCV.score doesn't respect scoring, inconsistent with GridSearchCV. Adding this to the answer.
– Julian Peller
Nov 22 '18 at 3:06
Actually, I found the bugfix in the changelog!! It's LogisticRegressionCV.score doesn't respect scoring, inconsistent with GridSearchCV. Adding this to the answer.
– Julian Peller
Nov 22 '18 at 3:06
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53381650%2fwhy-does-cross-val-score-in-sklearn-flip-the-value-of-the-metric%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Can you show the complete code and possibly some data which reproduces positive score when
model.score()
? I am not able to duplicate it on scikit-learn inbuilt datasets.– Vivek Kumar
Nov 20 '18 at 6:49
The complete code is at github.com/piccolbo/rightload branch basilica The ML code is in ml.py Sharing the data is more complex and running the code requires access to a web service. I need to think of something more self-contained for a more practical repro.
– piccolbo
Nov 20 '18 at 16:17
The code that generates the positive score is pretty trivial, in ml.py:127 and following lines. model.fit(X,y) followed by model.score(X,y), pretty much. I hope I got your question -- I still owe you some data for a complete repro, of course.
– piccolbo
Nov 20 '18 at 16:25
Got the repro but it requires sharing two pickles with data. Is there a SO preferred way of doing that?
– piccolbo
Nov 20 '18 at 19:11
Replaced repro with one that is self-contained and quick. Doesn't look like a simple sign flip anymore, though.
– piccolbo
Nov 20 '18 at 20:10