“coords” function of “pROC” package returns different sensibility and specificity values than...

Hi everybody and thank you very much in advance for your help.

I have performed a random forest model for classification. Now I want to determine the best threshold to optimize specificity and sensibility.

I am confused because, as stated in the title, the "coords" function of "pROC" package returns different values than the "confusionMatrix" function of the "caret" package.

Below is the code :

# package import



library(caret)

library(pROC)



# data import



data <- read.csv2("denonciation.csv", check.names = F)



# data partition



validation_index <- createDataPartition(data$Denonc, p=0.80,list=FALSE)

validation <- data[-validation_index,]

entrainement <- data[validation_index,]



# handling class imbalance



set.seed (7)

up_entrainement <- upSample(x=entrainement[,-ncol(entrainement)],y=entrainement$Denonc)



# Cross validation setting



control <- trainControl(method ="cv", number=10, classProbs = TRUE)



# Model training



fit.rf_up <-train(Denonc~EMOTION+Agreabilite_classe+Conscienciosite_classe, data = up_entrainement, method="rf", trControl = control)



# Best threshold determination



roc <- roc(up_entrainement$Denonc, predict(fit.rf_up, up_entrainement, type = "prob")[,2])

    coords(roc, x="best", input = "threshold", best.method = "closest.topleft")



### The best threshold seems to be .36 with a specificity of .79 and a sensitivity of .73 ###



# Confusion matrix with the best threshold returned by "coords"



probsTest <- predict(fit.rf_up, validation, type = "prob")

threshold <- 0.36

predictions <- factor(ifelse(probsTest[, "denoncant"] > threshold, "denoncant", "non_denoncant"))

confusionMatrix(predictions, validation$Denonc)

Here the values are different :

Confusion Matrix and Statistics



                Reference

Prediction      denoncant non_denoncant

  denoncant           433          1380

  non_denoncant       386          1671



           Accuracy : 0.5437          

             95% CI : (0.5278, 0.5595)

No Information Rate : 0.7884          

P-Value [Acc > NIR] : 1               



              Kappa : 0.0529          

 Mcnemar's Test P-Value : <2e-16          



        Sensitivity : 0.5287          

        Specificity : 0.5477          

     Pos Pred Value : 0.2388          

     Neg Pred Value : 0.8123          

         Prevalence : 0.2116          

     Detection Rate : 0.1119          

   Detection Prevalence : 0.4685          

    Balanced Accuracy : 0.5382          



   'Positive' Class : denoncant

Please, could you tell me why the "coords" function of the "pROC" package returns false values?

Many thanks,

Baboune

edited Nov 19 '18 at 11:06

desertnaut

16.8k63566

asked Nov 19 '18 at 8:19

Baboune

If I am not mistaken you chose the cutoff value based on train data (data the model saw). What you should have done is to choose the cutoff value based on hold out predictions in re-sampling.

– missuse
Nov 19 '18 at 9:40

add a comment |

Hi everybody and thank you very much in advance for your help.

I have performed a random forest model for classification. Now I want to determine the best threshold to optimize specificity and sensibility.

I am confused because, as stated in the title, the "coords" function of "pROC" package returns different values than the "confusionMatrix" function of the "caret" package.

Below is the code :

# package import



library(caret)

library(pROC)



# data import



data <- read.csv2("denonciation.csv", check.names = F)



# data partition



validation_index <- createDataPartition(data$Denonc, p=0.80,list=FALSE)

validation <- data[-validation_index,]

entrainement <- data[validation_index,]



# handling class imbalance



set.seed (7)

up_entrainement <- upSample(x=entrainement[,-ncol(entrainement)],y=entrainement$Denonc)



# Cross validation setting



control <- trainControl(method ="cv", number=10, classProbs = TRUE)



# Model training



fit.rf_up <-train(Denonc~EMOTION+Agreabilite_classe+Conscienciosite_classe, data = up_entrainement, method="rf", trControl = control)



# Best threshold determination



roc <- roc(up_entrainement$Denonc, predict(fit.rf_up, up_entrainement, type = "prob")[,2])

    coords(roc, x="best", input = "threshold", best.method = "closest.topleft")



### The best threshold seems to be .36 with a specificity of .79 and a sensitivity of .73 ###



# Confusion matrix with the best threshold returned by "coords"



probsTest <- predict(fit.rf_up, validation, type = "prob")

threshold <- 0.36

predictions <- factor(ifelse(probsTest[, "denoncant"] > threshold, "denoncant", "non_denoncant"))

confusionMatrix(predictions, validation$Denonc)

Here the values are different :

Confusion Matrix and Statistics



                Reference

Prediction      denoncant non_denoncant

  denoncant           433          1380

  non_denoncant       386          1671



           Accuracy : 0.5437          

             95% CI : (0.5278, 0.5595)

No Information Rate : 0.7884          

P-Value [Acc > NIR] : 1               



              Kappa : 0.0529          

 Mcnemar's Test P-Value : <2e-16          



        Sensitivity : 0.5287          

        Specificity : 0.5477          

     Pos Pred Value : 0.2388          

     Neg Pred Value : 0.8123          

         Prevalence : 0.2116          

     Detection Rate : 0.1119          

   Detection Prevalence : 0.4685          

    Balanced Accuracy : 0.5382          



   'Positive' Class : denoncant

Please, could you tell me why the "coords" function of the "pROC" package returns false values?

Many thanks,

Baboune

edited Nov 19 '18 at 11:06

desertnaut

16.8k63566

asked Nov 19 '18 at 8:19

Baboune

If I am not mistaken you chose the cutoff value based on train data (data the model saw). What you should have done is to choose the cutoff value based on hold out predictions in re-sampling.

– missuse
Nov 19 '18 at 9:40

add a comment |

Hi everybody and thank you very much in advance for your help.

I have performed a random forest model for classification. Now I want to determine the best threshold to optimize specificity and sensibility.

I am confused because, as stated in the title, the "coords" function of "pROC" package returns different values than the "confusionMatrix" function of the "caret" package.

Below is the code :

# package import



library(caret)

library(pROC)



# data import



data <- read.csv2("denonciation.csv", check.names = F)



# data partition



validation_index <- createDataPartition(data$Denonc, p=0.80,list=FALSE)

validation <- data[-validation_index,]

entrainement <- data[validation_index,]



# handling class imbalance



set.seed (7)

up_entrainement <- upSample(x=entrainement[,-ncol(entrainement)],y=entrainement$Denonc)



# Cross validation setting



control <- trainControl(method ="cv", number=10, classProbs = TRUE)



# Model training



fit.rf_up <-train(Denonc~EMOTION+Agreabilite_classe+Conscienciosite_classe, data = up_entrainement, method="rf", trControl = control)



# Best threshold determination



roc <- roc(up_entrainement$Denonc, predict(fit.rf_up, up_entrainement, type = "prob")[,2])

    coords(roc, x="best", input = "threshold", best.method = "closest.topleft")



### The best threshold seems to be .36 with a specificity of .79 and a sensitivity of .73 ###



# Confusion matrix with the best threshold returned by "coords"



probsTest <- predict(fit.rf_up, validation, type = "prob")

threshold <- 0.36

predictions <- factor(ifelse(probsTest[, "denoncant"] > threshold, "denoncant", "non_denoncant"))

confusionMatrix(predictions, validation$Denonc)

Here the values are different :

Confusion Matrix and Statistics



                Reference

Prediction      denoncant non_denoncant

  denoncant           433          1380

  non_denoncant       386          1671



           Accuracy : 0.5437          

             95% CI : (0.5278, 0.5595)

No Information Rate : 0.7884          

P-Value [Acc > NIR] : 1               



              Kappa : 0.0529          

 Mcnemar's Test P-Value : <2e-16          



        Sensitivity : 0.5287          

        Specificity : 0.5477          

     Pos Pred Value : 0.2388          

     Neg Pred Value : 0.8123          

         Prevalence : 0.2116          

     Detection Rate : 0.1119          

   Detection Prevalence : 0.4685          

    Balanced Accuracy : 0.5382          



   'Positive' Class : denoncant

Please, could you tell me why the "coords" function of the "pROC" package returns false values?

Many thanks,

Baboune

edited Nov 19 '18 at 11:06

desertnaut

16.8k63566

asked Nov 19 '18 at 8:19

Baboune

Hi everybody and thank you very much in advance for your help.

I have performed a random forest model for classification. Now I want to determine the best threshold to optimize specificity and sensibility.

I am confused because, as stated in the title, the "coords" function of "pROC" package returns different values than the "confusionMatrix" function of the "caret" package.

Below is the code :

# package import



library(caret)

library(pROC)



# data import



data <- read.csv2("denonciation.csv", check.names = F)



# data partition



validation_index <- createDataPartition(data$Denonc, p=0.80,list=FALSE)

validation <- data[-validation_index,]

entrainement <- data[validation_index,]



# handling class imbalance



set.seed (7)

up_entrainement <- upSample(x=entrainement[,-ncol(entrainement)],y=entrainement$Denonc)



# Cross validation setting



control <- trainControl(method ="cv", number=10, classProbs = TRUE)



# Model training



fit.rf_up <-train(Denonc~EMOTION+Agreabilite_classe+Conscienciosite_classe, data = up_entrainement, method="rf", trControl = control)



# Best threshold determination



roc <- roc(up_entrainement$Denonc, predict(fit.rf_up, up_entrainement, type = "prob")[,2])

    coords(roc, x="best", input = "threshold", best.method = "closest.topleft")



### The best threshold seems to be .36 with a specificity of .79 and a sensitivity of .73 ###



# Confusion matrix with the best threshold returned by "coords"



probsTest <- predict(fit.rf_up, validation, type = "prob")

threshold <- 0.36

predictions <- factor(ifelse(probsTest[, "denoncant"] > threshold, "denoncant", "non_denoncant"))

confusionMatrix(predictions, validation$Denonc)

Here the values are different :

Confusion Matrix and Statistics



                Reference

Prediction      denoncant non_denoncant

  denoncant           433          1380

  non_denoncant       386          1671



           Accuracy : 0.5437          

             95% CI : (0.5278, 0.5595)

No Information Rate : 0.7884          

P-Value [Acc > NIR] : 1               



              Kappa : 0.0529          

 Mcnemar's Test P-Value : <2e-16          



        Sensitivity : 0.5287          

        Specificity : 0.5477          

     Pos Pred Value : 0.2388          

     Neg Pred Value : 0.8123          

         Prevalence : 0.2116          

     Detection Rate : 0.1119          

   Detection Prevalence : 0.4685          

    Balanced Accuracy : 0.5382          



   'Positive' Class : denoncant

Please, could you tell me why the "coords" function of the "pROC" package returns false values?

Many thanks,

Baboune

r machine-learning r-caret proc confusion-matrix

edited Nov 19 '18 at 11:06

desertnaut

16.8k63566

asked Nov 19 '18 at 8:19

Baboune

edited Nov 19 '18 at 11:06

desertnaut

16.8k63566

asked Nov 19 '18 at 8:19

Baboune

edited Nov 19 '18 at 11:06

desertnaut

16.8k63566

edited Nov 19 '18 at 11:06

desertnaut

16.8k63566

edited Nov 19 '18 at 11:06

desertnaut

16.8k63566

asked Nov 19 '18 at 8:19

Baboune

asked Nov 19 '18 at 8:19

Baboune

asked Nov 19 '18 at 8:19

Baboune

If I am not mistaken you chose the cutoff value based on train data (data the model saw). What you should have done is to choose the cutoff value based on hold out predictions in re-sampling.

– missuse
Nov 19 '18 at 9:40

add a comment |

If I am not mistaken you chose the cutoff value based on train data (data the model saw). What you should have done is to choose the cutoff value based on hold out predictions in re-sampling.

– missuse
Nov 19 '18 at 9:40

If I am not mistaken you chose the cutoff value based on train data (data the model saw). What you should have done is to choose the cutoff value based on hold out predictions in re-sampling.

– missuse
Nov 19 '18 at 9:40

add a comment |

1 Answer
1

active

oldest

votes

There are 2 possible issues here that I can see:

While training the model, the samples from the 2 classes are balanced by up-sampling the less numerous class: the best threshold resulting from the model is also calibrated on the same up-sampled dataset. That is not the case for the validation data set as far as I can see.

The two results give out model metrics on different sets (training and validation): while they are supposed to be close together for a RandomForest model, considering all the averaging that occurs under the hood, this doesn't mean the results will be exactly the same. It is very unlikely that a RandomForest model will over-fit the data, but it is possible if the data consists of a mixture of several different populations with different distributions of feature vectors and/or different feature-response relations, which may not always be uniformly distributed in the training and validation sets, even if you do randomly sample the data (i.e. the distribution may be same on average, but not for particular training-validation divides).

I think the first one is what is going wrong, but unfortunately, I can't test out your code, since it depends on the the file denonciation.csv.

edited Nov 19 '18 at 10:52

answered Nov 19 '18 at 9:37

ritwik33

866

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53370706%2fcoords-function-of-proc-package-returns-different-sensibility-and-specificit%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

There are 2 possible issues here that I can see:

While training the model, the samples from the 2 classes are balanced by up-sampling the less numerous class: the best threshold resulting from the model is also calibrated on the same up-sampled dataset. That is not the case for the validation data set as far as I can see.

The two results give out model metrics on different sets (training and validation): while they are supposed to be close together for a RandomForest model, considering all the averaging that occurs under the hood, this doesn't mean the results will be exactly the same. It is very unlikely that a RandomForest model will over-fit the data, but it is possible if the data consists of a mixture of several different populations with different distributions of feature vectors and/or different feature-response relations, which may not always be uniformly distributed in the training and validation sets, even if you do randomly sample the data (i.e. the distribution may be same on average, but not for particular training-validation divides).

I think the first one is what is going wrong, but unfortunately, I can't test out your code, since it depends on the the file denonciation.csv.

edited Nov 19 '18 at 10:52

answered Nov 19 '18 at 9:37

ritwik33

866

add a comment |

There are 2 possible issues here that I can see:

While training the model, the samples from the 2 classes are balanced by up-sampling the less numerous class: the best threshold resulting from the model is also calibrated on the same up-sampled dataset. That is not the case for the validation data set as far as I can see.

The two results give out model metrics on different sets (training and validation): while they are supposed to be close together for a RandomForest model, considering all the averaging that occurs under the hood, this doesn't mean the results will be exactly the same. It is very unlikely that a RandomForest model will over-fit the data, but it is possible if the data consists of a mixture of several different populations with different distributions of feature vectors and/or different feature-response relations, which may not always be uniformly distributed in the training and validation sets, even if you do randomly sample the data (i.e. the distribution may be same on average, but not for particular training-validation divides).

I think the first one is what is going wrong, but unfortunately, I can't test out your code, since it depends on the the file denonciation.csv.

edited Nov 19 '18 at 10:52

answered Nov 19 '18 at 9:37

ritwik33

866

add a comment |

There are 2 possible issues here that I can see:

While training the model, the samples from the 2 classes are balanced by up-sampling the less numerous class: the best threshold resulting from the model is also calibrated on the same up-sampled dataset. That is not the case for the validation data set as far as I can see.

The two results give out model metrics on different sets (training and validation): while they are supposed to be close together for a RandomForest model, considering all the averaging that occurs under the hood, this doesn't mean the results will be exactly the same. It is very unlikely that a RandomForest model will over-fit the data, but it is possible if the data consists of a mixture of several different populations with different distributions of feature vectors and/or different feature-response relations, which may not always be uniformly distributed in the training and validation sets, even if you do randomly sample the data (i.e. the distribution may be same on average, but not for particular training-validation divides).

I think the first one is what is going wrong, but unfortunately, I can't test out your code, since it depends on the the file denonciation.csv.

edited Nov 19 '18 at 10:52

answered Nov 19 '18 at 9:37

ritwik33

866

There are 2 possible issues here that I can see:

While training the model, the samples from the 2 classes are balanced by up-sampling the less numerous class: the best threshold resulting from the model is also calibrated on the same up-sampled dataset. That is not the case for the validation data set as far as I can see.

The two results give out model metrics on different sets (training and validation): while they are supposed to be close together for a RandomForest model, considering all the averaging that occurs under the hood, this doesn't mean the results will be exactly the same. It is very unlikely that a RandomForest model will over-fit the data, but it is possible if the data consists of a mixture of several different populations with different distributions of feature vectors and/or different feature-response relations, which may not always be uniformly distributed in the training and validation sets, even if you do randomly sample the data (i.e. the distribution may be same on average, but not for particular training-validation divides).

I think the first one is what is going wrong, but unfortunately, I can't test out your code, since it depends on the the file denonciation.csv.

edited Nov 19 '18 at 10:52

answered Nov 19 '18 at 9:37

ritwik33

866

edited Nov 19 '18 at 10:52

answered Nov 19 '18 at 9:37

ritwik33

866

answered Nov 19 '18 at 9:37

ritwik33

866

answered Nov 19 '18 at 9:37

ritwik33

866

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

Search This Blog

Ufyukyu