Evaluate predictions by comparing to actual outcomes

I have a dataset like so (left column produced by an estimation algorithm, right column being what happened in reality).
[EDIT: Please note that every event is distinct (i.e., we are not repeatedly testing the same event, but every time we are making a prediction about some new kind of thing that may or may not happen).]

event probability        actual outcome (whether event occurred)

0.939658077              TRUE

0.705453465              FALSE

0.310251296              TRUE

0.385363009              FALSE

0.660532932              FALSE

0.290306978              TRUE

0.484473665              FALSE

0.01615261               FALSE

0.898152645              TRUE

0.389938993              TRUE

0.032598374              FALSE

0.599836035              FALSE

0.428701779              TRUE

0.7787285                TRUE

0.14356366               FALSE

0.65105148               FALSE

0.418174021              FALSE

0.724846388              TRUE

0.844266775              TRUE

0.437018647              TRUE

...                      ...

How can I evaluate the quality of the prediction algorithm? (Assume data set size is large enough.)

Thanks!!

EDIT: So, for example, if the estimated probability is 0.5, the model is saying it doesn't know what to predict, so in a way there is 0 error whatever the outcome. And the model could estimate a 0.9 probability of the event occurring, and once in ten times you would still expect it to not occur. However, over the full dataset, if the model keeps saying 0.1 and then the event usually occurs, and if it keeps saying 0.9 and the event usually does not occur, then it's performing poorly.

edited Jan 10 at 17:36

asked Jan 10 at 14:35

sesquipedalias

add a comment |

event probability        actual outcome (whether event occurred)

0.939658077              TRUE

0.705453465              FALSE

0.310251296              TRUE

0.385363009              FALSE

0.660532932              FALSE

0.290306978              TRUE

0.484473665              FALSE

0.01615261               FALSE

0.898152645              TRUE

0.389938993              TRUE

0.032598374              FALSE

0.599836035              FALSE

0.428701779              TRUE

0.7787285                TRUE

0.14356366               FALSE

0.65105148               FALSE

0.418174021              FALSE

0.724846388              TRUE

0.844266775              TRUE

0.437018647              TRUE

...                      ...

How can I evaluate the quality of the prediction algorithm? (Assume data set size is large enough.)

Thanks!!

edited Jan 10 at 17:36

asked Jan 10 at 14:35

sesquipedalias

add a comment |

event probability        actual outcome (whether event occurred)

0.939658077              TRUE

0.705453465              FALSE

0.310251296              TRUE

0.385363009              FALSE

0.660532932              FALSE

0.290306978              TRUE

0.484473665              FALSE

0.01615261               FALSE

0.898152645              TRUE

0.389938993              TRUE

0.032598374              FALSE

0.599836035              FALSE

0.428701779              TRUE

0.7787285                TRUE

0.14356366               FALSE

0.65105148               FALSE

0.418174021              FALSE

0.724846388              TRUE

0.844266775              TRUE

0.437018647              TRUE

...                      ...

How can I evaluate the quality of the prediction algorithm? (Assume data set size is large enough.)

Thanks!!

edited Jan 10 at 17:36

asked Jan 10 at 14:35

sesquipedalias

event probability        actual outcome (whether event occurred)

0.939658077              TRUE

0.705453465              FALSE

0.310251296              TRUE

0.385363009              FALSE

0.660532932              FALSE

0.290306978              TRUE

0.484473665              FALSE

0.01615261               FALSE

0.898152645              TRUE

0.389938993              TRUE

0.032598374              FALSE

0.599836035              FALSE

0.428701779              TRUE

0.7787285                TRUE

0.14356366               FALSE

0.65105148               FALSE

0.418174021              FALSE

0.724846388              TRUE

0.844266775              TRUE

0.437018647              TRUE

...                      ...

How can I evaluate the quality of the prediction algorithm? (Assume data set size is large enough.)

Thanks!!

probability statistics mathematical-modeling

edited Jan 10 at 17:36

asked Jan 10 at 14:35

sesquipedalias

edited Jan 10 at 17:36

asked Jan 10 at 14:35

sesquipedalias

edited Jan 10 at 17:36

asked Jan 10 at 14:35

sesquipedalias

asked Jan 10 at 14:35

sesquipedalias

asked Jan 10 at 14:35

sesquipedalias

add a comment |

1 Answer
1

active

oldest

votes

Confusion Matrix and ROC curves probably will suit you.

Check this out: https://towardsdatascience.com/understanding-auc-roc-curve-68b2303cc9c5

answered Jan 10 at 14:40

tfkLSTM

$begingroup$
Could you explain more instead of just giving a link?
$endgroup$
– Larry
Jan 10 at 14:48

$begingroup$
thanks for the link, I'm reading up on your suggestion right now! (but why is the answer already appearing as accepted, before I even look at it? i tried to un-accept it just to see if I can control the feedback [not because I have any problem with the answer - I've just started reading the linked page] but nothing happened)
$endgroup$
– sesquipedalias
Jan 10 at 16:56

$begingroup$
hmmm, the suggested solution requires classification into predefined classes, but as I put in the title of the question, here we have no classes: we just have a probability of an event occurring, and then the event may or may not occur (of course, either result is consistent with any estimation other that 0 or 1)... and we need to somehow measure how good the estimations of the event probability are, from the entire dataset of predictions vs outcomes... thanks... (I'll think about the linked content some more, though...)
$endgroup$
– sesquipedalias
Jan 10 at 17:09

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3068701%2fevaluate-predictions-by-comparing-to-actual-outcomes-but-no-categories-to-use%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Confusion Matrix and ROC curves probably will suit you.

Check this out: https://towardsdatascience.com/understanding-auc-roc-curve-68b2303cc9c5

answered Jan 10 at 14:40

tfkLSTM

$begingroup$
Could you explain more instead of just giving a link?
$endgroup$
– Larry
Jan 10 at 14:48

$begingroup$
thanks for the link, I'm reading up on your suggestion right now! (but why is the answer already appearing as accepted, before I even look at it? i tried to un-accept it just to see if I can control the feedback [not because I have any problem with the answer - I've just started reading the linked page] but nothing happened)
$endgroup$
– sesquipedalias
Jan 10 at 16:56

$begingroup$
hmmm, the suggested solution requires classification into predefined classes, but as I put in the title of the question, here we have no classes: we just have a probability of an event occurring, and then the event may or may not occur (of course, either result is consistent with any estimation other that 0 or 1)... and we need to somehow measure how good the estimations of the event probability are, from the entire dataset of predictions vs outcomes... thanks... (I'll think about the linked content some more, though...)
$endgroup$
– sesquipedalias
Jan 10 at 17:09

add a comment |

Confusion Matrix and ROC curves probably will suit you.

Check this out: https://towardsdatascience.com/understanding-auc-roc-curve-68b2303cc9c5

answered Jan 10 at 14:40

tfkLSTM

$begingroup$
Could you explain more instead of just giving a link?
$endgroup$
– Larry
Jan 10 at 14:48

$begingroup$
thanks for the link, I'm reading up on your suggestion right now! (but why is the answer already appearing as accepted, before I even look at it? i tried to un-accept it just to see if I can control the feedback [not because I have any problem with the answer - I've just started reading the linked page] but nothing happened)
$endgroup$
– sesquipedalias
Jan 10 at 16:56

$begingroup$
hmmm, the suggested solution requires classification into predefined classes, but as I put in the title of the question, here we have no classes: we just have a probability of an event occurring, and then the event may or may not occur (of course, either result is consistent with any estimation other that 0 or 1)... and we need to somehow measure how good the estimations of the event probability are, from the entire dataset of predictions vs outcomes... thanks... (I'll think about the linked content some more, though...)
$endgroup$
– sesquipedalias
Jan 10 at 17:09

add a comment |

Confusion Matrix and ROC curves probably will suit you.

Check this out: https://towardsdatascience.com/understanding-auc-roc-curve-68b2303cc9c5

answered Jan 10 at 14:40

tfkLSTM

Confusion Matrix and ROC curves probably will suit you.

Check this out: https://towardsdatascience.com/understanding-auc-roc-curve-68b2303cc9c5

answered Jan 10 at 14:40

tfkLSTM

answered Jan 10 at 14:40

tfkLSTM

answered Jan 10 at 14:40

tfkLSTM

answered Jan 10 at 14:40

tfkLSTM

$begingroup$
Could you explain more instead of just giving a link?
$endgroup$
– Larry
Jan 10 at 14:48

$begingroup$
thanks for the link, I'm reading up on your suggestion right now! (but why is the answer already appearing as accepted, before I even look at it? i tried to un-accept it just to see if I can control the feedback [not because I have any problem with the answer - I've just started reading the linked page] but nothing happened)
$endgroup$
– sesquipedalias
Jan 10 at 16:56

$begingroup$
hmmm, the suggested solution requires classification into predefined classes, but as I put in the title of the question, here we have no classes: we just have a probability of an event occurring, and then the event may or may not occur (of course, either result is consistent with any estimation other that 0 or 1)... and we need to somehow measure how good the estimations of the event probability are, from the entire dataset of predictions vs outcomes... thanks... (I'll think about the linked content some more, though...)
$endgroup$
– sesquipedalias
Jan 10 at 17:09

add a comment |

$begingroup$
Could you explain more instead of just giving a link?
$endgroup$
– Larry
Jan 10 at 14:48

$begingroup$
thanks for the link, I'm reading up on your suggestion right now! (but why is the answer already appearing as accepted, before I even look at it? i tried to un-accept it just to see if I can control the feedback [not because I have any problem with the answer - I've just started reading the linked page] but nothing happened)
$endgroup$
– sesquipedalias
Jan 10 at 16:56

$begingroup$
hmmm, the suggested solution requires classification into predefined classes, but as I put in the title of the question, here we have no classes: we just have a probability of an event occurring, and then the event may or may not occur (of course, either result is consistent with any estimation other that 0 or 1)... and we need to somehow measure how good the estimations of the event probability are, from the entire dataset of predictions vs outcomes... thanks... (I'll think about the linked content some more, though...)
$endgroup$
– sesquipedalias
Jan 10 at 17:09

Could you explain more instead of just giving a link?

– Larry
Jan 10 at 14:48

thanks for the link, I'm reading up on your suggestion right now! (but why is the answer already appearing as accepted, before I even look at it? i tried to un-accept it just to see if I can control the feedback [not because I have any problem with the answer - I've just started reading the linked page] but nothing happened)

– sesquipedalias
Jan 10 at 16:56

hmmm, the suggested solution requires classification into predefined classes, but as I put in the title of the question, here we have no classes: we just have a probability of an event occurring, and then the event may or may not occur (of course, either result is consistent with any estimation other that 0 or 1)... and we need to somehow measure how good the estimations of the event probability are, from the entire dataset of predictions vs outcomes... thanks... (I'll think about the linked content some more, though...)

– sesquipedalias
Jan 10 at 17:09

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Mathematics Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

Search This Blog

Ufyukyu