Why do we refer to the denominator of Bayes' theorem as “marginal probability”?

Consider the following characterization of the Bayes' theorem:

Bayes' Theorem

Given some observed data $x$, the posterior probability that the paramater $Theta$ has the value $theta$ is $p(theta mid x) = p(x mid theta) p (theta) / p(x)$, where $p(x mid theta)$ is the likelihood, $p(theta)$ is the prior probability of the value $theta$, and $p(x)$ is the marginal probability of the value $x$.

Is there any special reason why we call $p(x)$ the "marginal probability"? What is "marginal" about it?

edited Jan 18 at 23:52

nbro

2,41663174

asked Jun 26 '15 at 2:34

PP121

746

1

$begingroup$
"Marginal" does not mean "barely making the grade" but that the probability has been derived from a joint probability. The numerator is this instance is $p(xmid theta)p(theta) = p(x,theta)$ which is the joint probability and $p(x)$ (as well as $p(theta)$) are marginal probabilities of they are derived from $p(x,theta)$.
$endgroup$
– Dilip Sarwate
Jun 26 '15 at 2:40

$begingroup$
"Marginal probability" means the same thing here that it means in other contexts, i.e. "unconditional". (The meaning is context-dependent, since all probabilities are conditional.) $qquad$
$endgroup$
– Michael Hardy
Sep 27 '16 at 7:20

add a comment |

Consider the following characterization of the Bayes' theorem:

Bayes' Theorem

Given some observed data $x$, the posterior probability that the paramater $Theta$ has the value $theta$ is $p(theta mid x) = p(x mid theta) p (theta) / p(x)$, where $p(x mid theta)$ is the likelihood, $p(theta)$ is the prior probability of the value $theta$, and $p(x)$ is the marginal probability of the value $x$.

Is there any special reason why we call $p(x)$ the "marginal probability"? What is "marginal" about it?

edited Jan 18 at 23:52

nbro

2,41663174

asked Jun 26 '15 at 2:34

PP121

746

1

$begingroup$
"Marginal" does not mean "barely making the grade" but that the probability has been derived from a joint probability. The numerator is this instance is $p(xmid theta)p(theta) = p(x,theta)$ which is the joint probability and $p(x)$ (as well as $p(theta)$) are marginal probabilities of they are derived from $p(x,theta)$.
$endgroup$
– Dilip Sarwate
Jun 26 '15 at 2:40

$begingroup$
"Marginal probability" means the same thing here that it means in other contexts, i.e. "unconditional". (The meaning is context-dependent, since all probabilities are conditional.) $qquad$
$endgroup$
– Michael Hardy
Sep 27 '16 at 7:20

add a comment |

Consider the following characterization of the Bayes' theorem:

Bayes' Theorem

Given some observed data $x$, the posterior probability that the paramater $Theta$ has the value $theta$ is $p(theta mid x) = p(x mid theta) p (theta) / p(x)$, where $p(x mid theta)$ is the likelihood, $p(theta)$ is the prior probability of the value $theta$, and $p(x)$ is the marginal probability of the value $x$.

Is there any special reason why we call $p(x)$ the "marginal probability"? What is "marginal" about it?

edited Jan 18 at 23:52

nbro

2,41663174

asked Jun 26 '15 at 2:34

PP121

746

Consider the following characterization of the Bayes' theorem:

Bayes' Theorem

Given some observed data $x$, the posterior probability that the paramater $Theta$ has the value $theta$ is $p(theta mid x) = p(x mid theta) p (theta) / p(x)$, where $p(x mid theta)$ is the likelihood, $p(theta)$ is the prior probability of the value $theta$, and $p(x)$ is the marginal probability of the value $x$.

Is there any special reason why we call $p(x)$ the "marginal probability"? What is "marginal" about it?

probability probability-theory bayes-theorem

edited Jan 18 at 23:52

nbro

2,41663174

asked Jun 26 '15 at 2:34

PP121

746

edited Jan 18 at 23:52

nbro

2,41663174

asked Jun 26 '15 at 2:34

PP121

746

edited Jan 18 at 23:52

nbro

2,41663174

edited Jan 18 at 23:52

nbro

2,41663174

edited Jan 18 at 23:52

nbro

2,41663174

asked Jun 26 '15 at 2:34

PP121

746

asked Jun 26 '15 at 2:34

PP121

746

asked Jun 26 '15 at 2:34

PP121

746

1

$begingroup$
"Marginal" does not mean "barely making the grade" but that the probability has been derived from a joint probability. The numerator is this instance is $p(xmid theta)p(theta) = p(x,theta)$ which is the joint probability and $p(x)$ (as well as $p(theta)$) are marginal probabilities of they are derived from $p(x,theta)$.
$endgroup$
– Dilip Sarwate
Jun 26 '15 at 2:40

$begingroup$
"Marginal probability" means the same thing here that it means in other contexts, i.e. "unconditional". (The meaning is context-dependent, since all probabilities are conditional.) $qquad$
$endgroup$
– Michael Hardy
Sep 27 '16 at 7:20

add a comment |

1

$begingroup$
"Marginal" does not mean "barely making the grade" but that the probability has been derived from a joint probability. The numerator is this instance is $p(xmid theta)p(theta) = p(x,theta)$ which is the joint probability and $p(x)$ (as well as $p(theta)$) are marginal probabilities of they are derived from $p(x,theta)$.
$endgroup$
– Dilip Sarwate
Jun 26 '15 at 2:40

$begingroup$
"Marginal probability" means the same thing here that it means in other contexts, i.e. "unconditional". (The meaning is context-dependent, since all probabilities are conditional.) $qquad$
$endgroup$
– Michael Hardy
Sep 27 '16 at 7:20

"Marginal" does not mean "barely making the grade" but that the probability has been derived from a joint probability. The numerator is this instance is $p(xmid theta)p(theta) = p(x,theta)$ which is the joint probability and $p(x)$ (as well as $p(theta)$) are marginal probabilities of they are derived from $p(x,theta)$.

– Dilip Sarwate
Jun 26 '15 at 2:40

"Marginal probability" means the same thing here that it means in other contexts, i.e. "unconditional". (The meaning is context-dependent, since all probabilities are conditional.) $qquad$

– Michael Hardy
Sep 27 '16 at 7:20

add a comment |

3 Answers
3

active

oldest

votes

If you consider a joint distribution to be a table of values in columns and rows with there probabilities entered in the cells, then the "marginal distribution" is found by summing the values in the table along rows (or columns) and writing the total in the margins of the table.

$$begin{array}{c c} & X \ Theta & boxed{begin{array}{c|cc|c} ~ & 0 & 1 & Xmid Theta \ hline 0 & 0.15 & 0.35 & 0.5 \ 1 & 0.20 & 0.30 & 0.5 \hline Thetamid X & 0.35 & 0.65 & ~end{array}}end{array}$$

edited Jun 26 '15 at 2:49

answered Jun 26 '15 at 2:39

Graham Kemp

86.2k43478

$begingroup$
Yes, though I am not sure I understand your $Theta mid X$ or $X mid Theta$. I would have thought $0.65$ was $mathbb{P}(X=1)$
$endgroup$
– Henry
Sep 27 '16 at 7:23

$begingroup$
I hope you are aware of the fact that your answer is confusing. $Theta mid X$ is often used to represent "$Theta$ given $X$" (a conditional) and not to represent a marginal. A marginal is just X or $Theta$. Furthermore, the OP asked why p(x), the denominator is called a "marginal" probability. I think that the doubt lies in the fact that p(x) is called marginal, whereas $p(theta)$ is called prior, but both can be calculated as marginals.
$endgroup$
– nbro
Jan 18 at 23:49

add a comment |

To me, bayes theorem is all about inverting likelihood functions, and in that context calling it marginal probabity makes sense.

Lets say I have a observation $c$,

and a collection of states $mathbf{s}={s_1,ldots,s_n}$, that could be causing that observation.

And each of those states also defines a likelihood: $P(cmid s_i)$

as well we have a prior $P(s_i)$ (I'm assuming you have already motivated the prior, if not ask another question on this site)

So I want to know the state, based on the variable

If I just wanted to know the most likely state, and how they compair to each other, I could define a scoring function -- combining the likelihood of our observation given we are in the state, with the change of being in the state: $$operatorname{score}_c(s_i)= P(cmid s_i)P(s_i)$$

Then to find the most likely state $s^star$, i would just find the argmax $$s^star = operatorname{argmax}_{forall s_i in mathbf{s}} operatorname{score}_c(s_i) = operatorname{argmax}_{forall s_i in mathbf{s}} P(cmid s_i)P(s_i) $$

That score function is quiet nice. We can think of a score vector, which has all the scores and we can see which is the most likely, and which is the least. But it does not sum to one. We'ld like to make it sum to one -- we would normalise it and call it a probability (even if it isn't -- but it will turn out it is). Our normalised score obviously depends on $c$ so it will be $P(s_imid c)$. The normalised score is given by
$$P(s_imid c)=dfrac{operatorname{score}_c(s_i)}{sum_{forall s_jin mathbf{s}} operatorname{score}_c(s_j) } = dfrac{P(cmid s_i)P(s_i)}{sum_{forall s_jin mathbf{s}} P(cmid s_j)P(s_j) }$$
- the above is a very useful form of Bayes Theorem.
- let's take a closer look at the bottom line:
  $$sum_{forall s_jin mathbf{s}} P(cmid s_j)P(s_j) = sum_{forall s_jin mathbf{s}} P(c,s_j)$$

So we are summing the Joint probability, over all possible values that one of its fields can take. That is the very definition of the marginal probability of the other field.
$$P(c) = sum_{forall s_jin mathbf{s}} P(c,s_j)$$

Our bottom like -- the normalising factor to make it sum to one -- that is just the marginal probability of $c$. Substituting that back in:
$$P(s_imid c) = dfrac{P(cmid s_i)P(s_i)}{P(c)}$$

So the bottom line $P(c)$ was just a marginally probability, that we find by summing over all possible values for the other field ($s_i$) in the top line.

edited Sep 27 '16 at 7:19

Michael Hardy

answered Sep 27 '16 at 6:41

Lyndon White

6551620

$begingroup$
Superb explanation. If you (or anyone else) could provide a motivation for the prior, i'd be grateful.
$endgroup$
– blz
Nov 2 '18 at 14:52

$begingroup$
Please ask a separate question and link back to this QA
$endgroup$
– Lyndon White
Nov 3 '18 at 15:23

add a comment |

The explanation I was given when I was taught conditional probabilities is that if you draw up a table of the probabilities $p(x,y)$, then the row/column sums
$$ p(x) = sum_{y} p(x,y) $$
(by the law of total probability) are written in the margins of the table.

edited Jan 19 at 5:21

answered Jun 26 '15 at 2:38

Chappers

55.9k74194

$begingroup$
By $p(x,y)$ do you just mean $p(x wedge y)$ (i.e., the probability of $x$ and $y$ co-occurring)?
$endgroup$
– PP121
Jun 26 '15 at 2:42

1

$begingroup$
@PP121 Yes. It's an abbreviation for the joint probability. More specifically $p_{X,Y}(x,y) = mathsf P(X=x cap Y=y)$.
$endgroup$
– Graham Kemp
Jun 26 '15 at 2:44

$begingroup$
So then what is analogous to $X$ and $Y$ in the original example I wrote out? Is it $X = {ldots x ldots}$ and $Theta = {ldots theta ldots }$?
$endgroup$
– PP121
Jun 26 '15 at 2:49

1

$begingroup$
Yes. @PP121 That would be so. $p(xmid theta) = P(X=xmid Theta=theta)$
$endgroup$
– Graham Kemp
Jun 26 '15 at 2:51

1

$begingroup$
@PP121 No; it is literal, at least for discrete random variables. $X$ being a discrete random variable means that, on inspection, it will be found to have one of the values within the sample space with a certain probability. For continuous random variables the appropriate measure is a probability density and things are somewhat more involved, but mostly the same principles apply.
$endgroup$
– Graham Kemp
Jun 26 '15 at 3:56

|
show 1 more comment

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f1339666%2fwhy-do-we-refer-to-the-denominator-of-bayes-theorem-as-marginal-probability%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

$$begin{array}{c c} & X \ Theta & boxed{begin{array}{c|cc|c} ~ & 0 & 1 & Xmid Theta \ hline 0 & 0.15 & 0.35 & 0.5 \ 1 & 0.20 & 0.30 & 0.5 \hline Thetamid X & 0.35 & 0.65 & ~end{array}}end{array}$$

edited Jun 26 '15 at 2:49

answered Jun 26 '15 at 2:39

Graham Kemp

86.2k43478

$begingroup$
Yes, though I am not sure I understand your $Theta mid X$ or $X mid Theta$. I would have thought $0.65$ was $mathbb{P}(X=1)$
$endgroup$
– Henry
Sep 27 '16 at 7:23

$begingroup$
I hope you are aware of the fact that your answer is confusing. $Theta mid X$ is often used to represent "$Theta$ given $X$" (a conditional) and not to represent a marginal. A marginal is just X or $Theta$. Furthermore, the OP asked why p(x), the denominator is called a "marginal" probability. I think that the doubt lies in the fact that p(x) is called marginal, whereas $p(theta)$ is called prior, but both can be calculated as marginals.
$endgroup$
– nbro
Jan 18 at 23:49

add a comment |

$$begin{array}{c c} & X \ Theta & boxed{begin{array}{c|cc|c} ~ & 0 & 1 & Xmid Theta \ hline 0 & 0.15 & 0.35 & 0.5 \ 1 & 0.20 & 0.30 & 0.5 \hline Thetamid X & 0.35 & 0.65 & ~end{array}}end{array}$$

edited Jun 26 '15 at 2:49

answered Jun 26 '15 at 2:39

Graham Kemp

86.2k43478

$begingroup$
Yes, though I am not sure I understand your $Theta mid X$ or $X mid Theta$. I would have thought $0.65$ was $mathbb{P}(X=1)$
$endgroup$
– Henry
Sep 27 '16 at 7:23

$begingroup$
I hope you are aware of the fact that your answer is confusing. $Theta mid X$ is often used to represent "$Theta$ given $X$" (a conditional) and not to represent a marginal. A marginal is just X or $Theta$. Furthermore, the OP asked why p(x), the denominator is called a "marginal" probability. I think that the doubt lies in the fact that p(x) is called marginal, whereas $p(theta)$ is called prior, but both can be calculated as marginals.
$endgroup$
– nbro
Jan 18 at 23:49

add a comment |

$$begin{array}{c c} & X \ Theta & boxed{begin{array}{c|cc|c} ~ & 0 & 1 & Xmid Theta \ hline 0 & 0.15 & 0.35 & 0.5 \ 1 & 0.20 & 0.30 & 0.5 \hline Thetamid X & 0.35 & 0.65 & ~end{array}}end{array}$$

edited Jun 26 '15 at 2:49

answered Jun 26 '15 at 2:39

Graham Kemp

86.2k43478

$$begin{array}{c c} & X \ Theta & boxed{begin{array}{c|cc|c} ~ & 0 & 1 & Xmid Theta \ hline 0 & 0.15 & 0.35 & 0.5 \ 1 & 0.20 & 0.30 & 0.5 \hline Thetamid X & 0.35 & 0.65 & ~end{array}}end{array}$$

edited Jun 26 '15 at 2:49

answered Jun 26 '15 at 2:39

Graham Kemp

86.2k43478

edited Jun 26 '15 at 2:49

answered Jun 26 '15 at 2:39

Graham Kemp

86.2k43478

answered Jun 26 '15 at 2:39

Graham Kemp

86.2k43478

answered Jun 26 '15 at 2:39

Graham Kemp

86.2k43478

$begingroup$
Yes, though I am not sure I understand your $Theta mid X$ or $X mid Theta$. I would have thought $0.65$ was $mathbb{P}(X=1)$
$endgroup$
– Henry
Sep 27 '16 at 7:23

$begingroup$
I hope you are aware of the fact that your answer is confusing. $Theta mid X$ is often used to represent "$Theta$ given $X$" (a conditional) and not to represent a marginal. A marginal is just X or $Theta$. Furthermore, the OP asked why p(x), the denominator is called a "marginal" probability. I think that the doubt lies in the fact that p(x) is called marginal, whereas $p(theta)$ is called prior, but both can be calculated as marginals.
$endgroup$
– nbro
Jan 18 at 23:49

add a comment |

$begingroup$
Yes, though I am not sure I understand your $Theta mid X$ or $X mid Theta$. I would have thought $0.65$ was $mathbb{P}(X=1)$
$endgroup$
– Henry
Sep 27 '16 at 7:23

$begingroup$
I hope you are aware of the fact that your answer is confusing. $Theta mid X$ is often used to represent "$Theta$ given $X$" (a conditional) and not to represent a marginal. A marginal is just X or $Theta$. Furthermore, the OP asked why p(x), the denominator is called a "marginal" probability. I think that the doubt lies in the fact that p(x) is called marginal, whereas $p(theta)$ is called prior, but both can be calculated as marginals.
$endgroup$
– nbro
Jan 18 at 23:49

Yes, though I am not sure I understand your $Theta mid X$ or $X mid Theta$. I would have thought $0.65$ was $mathbb{P}(X=1)$

– Henry
Sep 27 '16 at 7:23

I hope you are aware of the fact that your answer is confusing. $Theta mid X$ is often used to represent "$Theta$ given $X$" (a conditional) and not to represent a marginal. A marginal is just X or $Theta$. Furthermore, the OP asked why p(x), the denominator is called a "marginal" probability. I think that the doubt lies in the fact that p(x) is called marginal, whereas $p(theta)$ is called prior, but both can be calculated as marginals.

– nbro
Jan 18 at 23:49

add a comment |

To me, bayes theorem is all about inverting likelihood functions, and in that context calling it marginal probabity makes sense.

Lets say I have a observation $c$,

and a collection of states $mathbf{s}={s_1,ldots,s_n}$, that could be causing that observation.

And each of those states also defines a likelihood: $P(cmid s_i)$

as well we have a prior $P(s_i)$ (I'm assuming you have already motivated the prior, if not ask another question on this site)

So I want to know the state, based on the variable

If I just wanted to know the most likely state, and how they compair to each other, I could define a scoring function -- combining the likelihood of our observation given we are in the state, with the change of being in the state: $$operatorname{score}_c(s_i)= P(cmid s_i)P(s_i)$$

Then to find the most likely state $s^star$, i would just find the argmax $$s^star = operatorname{argmax}_{forall s_i in mathbf{s}} operatorname{score}_c(s_i) = operatorname{argmax}_{forall s_i in mathbf{s}} P(cmid s_i)P(s_i) $$

That score function is quiet nice. We can think of a score vector, which has all the scores and we can see which is the most likely, and which is the least. But it does not sum to one. We'ld like to make it sum to one -- we would normalise it and call it a probability (even if it isn't -- but it will turn out it is). Our normalised score obviously depends on $c$ so it will be $P(s_imid c)$. The normalised score is given by
$$P(s_imid c)=dfrac{operatorname{score}_c(s_i)}{sum_{forall s_jin mathbf{s}} operatorname{score}_c(s_j) } = dfrac{P(cmid s_i)P(s_i)}{sum_{forall s_jin mathbf{s}} P(cmid s_j)P(s_j) }$$
- the above is a very useful form of Bayes Theorem.
- let's take a closer look at the bottom line:
  $$sum_{forall s_jin mathbf{s}} P(cmid s_j)P(s_j) = sum_{forall s_jin mathbf{s}} P(c,s_j)$$

So we are summing the Joint probability, over all possible values that one of its fields can take. That is the very definition of the marginal probability of the other field.
$$P(c) = sum_{forall s_jin mathbf{s}} P(c,s_j)$$

Our bottom like -- the normalising factor to make it sum to one -- that is just the marginal probability of $c$. Substituting that back in:
$$P(s_imid c) = dfrac{P(cmid s_i)P(s_i)}{P(c)}$$

So the bottom line $P(c)$ was just a marginally probability, that we find by summing over all possible values for the other field ($s_i$) in the top line.

edited Sep 27 '16 at 7:19

Michael Hardy

answered Sep 27 '16 at 6:41

Lyndon White

6551620

$begingroup$
Superb explanation. If you (or anyone else) could provide a motivation for the prior, i'd be grateful.
$endgroup$
– blz
Nov 2 '18 at 14:52

$begingroup$
Please ask a separate question and link back to this QA
$endgroup$
– Lyndon White
Nov 3 '18 at 15:23

add a comment |

To me, bayes theorem is all about inverting likelihood functions, and in that context calling it marginal probabity makes sense.

Lets say I have a observation $c$,

and a collection of states $mathbf{s}={s_1,ldots,s_n}$, that could be causing that observation.

And each of those states also defines a likelihood: $P(cmid s_i)$

as well we have a prior $P(s_i)$ (I'm assuming you have already motivated the prior, if not ask another question on this site)

So I want to know the state, based on the variable

If I just wanted to know the most likely state, and how they compair to each other, I could define a scoring function -- combining the likelihood of our observation given we are in the state, with the change of being in the state: $$operatorname{score}_c(s_i)= P(cmid s_i)P(s_i)$$

Then to find the most likely state $s^star$, i would just find the argmax $$s^star = operatorname{argmax}_{forall s_i in mathbf{s}} operatorname{score}_c(s_i) = operatorname{argmax}_{forall s_i in mathbf{s}} P(cmid s_i)P(s_i) $$

That score function is quiet nice. We can think of a score vector, which has all the scores and we can see which is the most likely, and which is the least. But it does not sum to one. We'ld like to make it sum to one -- we would normalise it and call it a probability (even if it isn't -- but it will turn out it is). Our normalised score obviously depends on $c$ so it will be $P(s_imid c)$. The normalised score is given by
$$P(s_imid c)=dfrac{operatorname{score}_c(s_i)}{sum_{forall s_jin mathbf{s}} operatorname{score}_c(s_j) } = dfrac{P(cmid s_i)P(s_i)}{sum_{forall s_jin mathbf{s}} P(cmid s_j)P(s_j) }$$
- the above is a very useful form of Bayes Theorem.
- let's take a closer look at the bottom line:
  $$sum_{forall s_jin mathbf{s}} P(cmid s_j)P(s_j) = sum_{forall s_jin mathbf{s}} P(c,s_j)$$

So we are summing the Joint probability, over all possible values that one of its fields can take. That is the very definition of the marginal probability of the other field.
$$P(c) = sum_{forall s_jin mathbf{s}} P(c,s_j)$$

Our bottom like -- the normalising factor to make it sum to one -- that is just the marginal probability of $c$. Substituting that back in:
$$P(s_imid c) = dfrac{P(cmid s_i)P(s_i)}{P(c)}$$

So the bottom line $P(c)$ was just a marginally probability, that we find by summing over all possible values for the other field ($s_i$) in the top line.

edited Sep 27 '16 at 7:19

Michael Hardy

answered Sep 27 '16 at 6:41

Lyndon White

6551620

$begingroup$
Superb explanation. If you (or anyone else) could provide a motivation for the prior, i'd be grateful.
$endgroup$
– blz
Nov 2 '18 at 14:52

$begingroup$
Please ask a separate question and link back to this QA
$endgroup$
– Lyndon White
Nov 3 '18 at 15:23

add a comment |

To me, bayes theorem is all about inverting likelihood functions, and in that context calling it marginal probabity makes sense.

Lets say I have a observation $c$,

and a collection of states $mathbf{s}={s_1,ldots,s_n}$, that could be causing that observation.

And each of those states also defines a likelihood: $P(cmid s_i)$

as well we have a prior $P(s_i)$ (I'm assuming you have already motivated the prior, if not ask another question on this site)

So I want to know the state, based on the variable

If I just wanted to know the most likely state, and how they compair to each other, I could define a scoring function -- combining the likelihood of our observation given we are in the state, with the change of being in the state: $$operatorname{score}_c(s_i)= P(cmid s_i)P(s_i)$$

Then to find the most likely state $s^star$, i would just find the argmax $$s^star = operatorname{argmax}_{forall s_i in mathbf{s}} operatorname{score}_c(s_i) = operatorname{argmax}_{forall s_i in mathbf{s}} P(cmid s_i)P(s_i) $$

That score function is quiet nice. We can think of a score vector, which has all the scores and we can see which is the most likely, and which is the least. But it does not sum to one. We'ld like to make it sum to one -- we would normalise it and call it a probability (even if it isn't -- but it will turn out it is). Our normalised score obviously depends on $c$ so it will be $P(s_imid c)$. The normalised score is given by
$$P(s_imid c)=dfrac{operatorname{score}_c(s_i)}{sum_{forall s_jin mathbf{s}} operatorname{score}_c(s_j) } = dfrac{P(cmid s_i)P(s_i)}{sum_{forall s_jin mathbf{s}} P(cmid s_j)P(s_j) }$$
- the above is a very useful form of Bayes Theorem.
- let's take a closer look at the bottom line:
  $$sum_{forall s_jin mathbf{s}} P(cmid s_j)P(s_j) = sum_{forall s_jin mathbf{s}} P(c,s_j)$$

So we are summing the Joint probability, over all possible values that one of its fields can take. That is the very definition of the marginal probability of the other field.
$$P(c) = sum_{forall s_jin mathbf{s}} P(c,s_j)$$

Our bottom like -- the normalising factor to make it sum to one -- that is just the marginal probability of $c$. Substituting that back in:
$$P(s_imid c) = dfrac{P(cmid s_i)P(s_i)}{P(c)}$$

So the bottom line $P(c)$ was just a marginally probability, that we find by summing over all possible values for the other field ($s_i$) in the top line.

edited Sep 27 '16 at 7:19

Michael Hardy

answered Sep 27 '16 at 6:41

Lyndon White

6551620

To me, bayes theorem is all about inverting likelihood functions, and in that context calling it marginal probabity makes sense.

Lets say I have a observation $c$,

and a collection of states $mathbf{s}={s_1,ldots,s_n}$, that could be causing that observation.

And each of those states also defines a likelihood: $P(cmid s_i)$

as well we have a prior $P(s_i)$ (I'm assuming you have already motivated the prior, if not ask another question on this site)

So I want to know the state, based on the variable

If I just wanted to know the most likely state, and how they compair to each other, I could define a scoring function -- combining the likelihood of our observation given we are in the state, with the change of being in the state: $$operatorname{score}_c(s_i)= P(cmid s_i)P(s_i)$$

Then to find the most likely state $s^star$, i would just find the argmax $$s^star = operatorname{argmax}_{forall s_i in mathbf{s}} operatorname{score}_c(s_i) = operatorname{argmax}_{forall s_i in mathbf{s}} P(cmid s_i)P(s_i) $$

That score function is quiet nice. We can think of a score vector, which has all the scores and we can see which is the most likely, and which is the least. But it does not sum to one. We'ld like to make it sum to one -- we would normalise it and call it a probability (even if it isn't -- but it will turn out it is). Our normalised score obviously depends on $c$ so it will be $P(s_imid c)$. The normalised score is given by
$$P(s_imid c)=dfrac{operatorname{score}_c(s_i)}{sum_{forall s_jin mathbf{s}} operatorname{score}_c(s_j) } = dfrac{P(cmid s_i)P(s_i)}{sum_{forall s_jin mathbf{s}} P(cmid s_j)P(s_j) }$$
- the above is a very useful form of Bayes Theorem.
- let's take a closer look at the bottom line:
  $$sum_{forall s_jin mathbf{s}} P(cmid s_j)P(s_j) = sum_{forall s_jin mathbf{s}} P(c,s_j)$$

So we are summing the Joint probability, over all possible values that one of its fields can take. That is the very definition of the marginal probability of the other field.
$$P(c) = sum_{forall s_jin mathbf{s}} P(c,s_j)$$

Our bottom like -- the normalising factor to make it sum to one -- that is just the marginal probability of $c$. Substituting that back in:
$$P(s_imid c) = dfrac{P(cmid s_i)P(s_i)}{P(c)}$$

So the bottom line $P(c)$ was just a marginally probability, that we find by summing over all possible values for the other field ($s_i$) in the top line.

edited Sep 27 '16 at 7:19

Michael Hardy

answered Sep 27 '16 at 6:41

Lyndon White

6551620

edited Sep 27 '16 at 7:19

Michael Hardy

edited Sep 27 '16 at 7:19

Michael Hardy

edited Sep 27 '16 at 7:19

Michael Hardy

answered Sep 27 '16 at 6:41

Lyndon White

6551620

answered Sep 27 '16 at 6:41

Lyndon White

6551620

answered Sep 27 '16 at 6:41

Lyndon White

6551620

$begingroup$
Superb explanation. If you (or anyone else) could provide a motivation for the prior, i'd be grateful.
$endgroup$
– blz
Nov 2 '18 at 14:52

$begingroup$
Please ask a separate question and link back to this QA
$endgroup$
– Lyndon White
Nov 3 '18 at 15:23

add a comment |

$begingroup$
Superb explanation. If you (or anyone else) could provide a motivation for the prior, i'd be grateful.
$endgroup$
– blz
Nov 2 '18 at 14:52

$begingroup$
Please ask a separate question and link back to this QA
$endgroup$
– Lyndon White
Nov 3 '18 at 15:23

Superb explanation. If you (or anyone else) could provide a motivation for the prior, i'd be grateful.

– blz
Nov 2 '18 at 14:52

Please ask a separate question and link back to this QA

– Lyndon White
Nov 3 '18 at 15:23

add a comment |

edited Jan 19 at 5:21

answered Jun 26 '15 at 2:38

Chappers

55.9k74194

$begingroup$
By $p(x,y)$ do you just mean $p(x wedge y)$ (i.e., the probability of $x$ and $y$ co-occurring)?
$endgroup$
– PP121
Jun 26 '15 at 2:42

1

$begingroup$
@PP121 Yes. It's an abbreviation for the joint probability. More specifically $p_{X,Y}(x,y) = mathsf P(X=x cap Y=y)$.
$endgroup$
– Graham Kemp
Jun 26 '15 at 2:44

$begingroup$
So then what is analogous to $X$ and $Y$ in the original example I wrote out? Is it $X = {ldots x ldots}$ and $Theta = {ldots theta ldots }$?
$endgroup$
– PP121
Jun 26 '15 at 2:49

1

$begingroup$
Yes. @PP121 That would be so. $p(xmid theta) = P(X=xmid Theta=theta)$
$endgroup$
– Graham Kemp
Jun 26 '15 at 2:51

1

$begingroup$
@PP121 No; it is literal, at least for discrete random variables. $X$ being a discrete random variable means that, on inspection, it will be found to have one of the values within the sample space with a certain probability. For continuous random variables the appropriate measure is a probability density and things are somewhat more involved, but mostly the same principles apply.
$endgroup$
– Graham Kemp
Jun 26 '15 at 3:56

|
show 1 more comment

edited Jan 19 at 5:21

answered Jun 26 '15 at 2:38

Chappers

55.9k74194

$begingroup$
By $p(x,y)$ do you just mean $p(x wedge y)$ (i.e., the probability of $x$ and $y$ co-occurring)?
$endgroup$
– PP121
Jun 26 '15 at 2:42

1

$begingroup$
@PP121 Yes. It's an abbreviation for the joint probability. More specifically $p_{X,Y}(x,y) = mathsf P(X=x cap Y=y)$.
$endgroup$
– Graham Kemp
Jun 26 '15 at 2:44

$begingroup$
So then what is analogous to $X$ and $Y$ in the original example I wrote out? Is it $X = {ldots x ldots}$ and $Theta = {ldots theta ldots }$?
$endgroup$
– PP121
Jun 26 '15 at 2:49

1

$begingroup$
Yes. @PP121 That would be so. $p(xmid theta) = P(X=xmid Theta=theta)$
$endgroup$
– Graham Kemp
Jun 26 '15 at 2:51

1

$begingroup$
@PP121 No; it is literal, at least for discrete random variables. $X$ being a discrete random variable means that, on inspection, it will be found to have one of the values within the sample space with a certain probability. For continuous random variables the appropriate measure is a probability density and things are somewhat more involved, but mostly the same principles apply.
$endgroup$
– Graham Kemp
Jun 26 '15 at 3:56

|
show 1 more comment

edited Jan 19 at 5:21

answered Jun 26 '15 at 2:38

Chappers

55.9k74194

edited Jan 19 at 5:21

answered Jun 26 '15 at 2:38

Chappers

55.9k74194

edited Jan 19 at 5:21

answered Jun 26 '15 at 2:38

Chappers

55.9k74194

answered Jun 26 '15 at 2:38

Chappers

55.9k74194

answered Jun 26 '15 at 2:38

Chappers

55.9k74194

$begingroup$
By $p(x,y)$ do you just mean $p(x wedge y)$ (i.e., the probability of $x$ and $y$ co-occurring)?
$endgroup$
– PP121
Jun 26 '15 at 2:42

1

$begingroup$
@PP121 Yes. It's an abbreviation for the joint probability. More specifically $p_{X,Y}(x,y) = mathsf P(X=x cap Y=y)$.
$endgroup$
– Graham Kemp
Jun 26 '15 at 2:44

$begingroup$
So then what is analogous to $X$ and $Y$ in the original example I wrote out? Is it $X = {ldots x ldots}$ and $Theta = {ldots theta ldots }$?
$endgroup$
– PP121
Jun 26 '15 at 2:49

1

$begingroup$
Yes. @PP121 That would be so. $p(xmid theta) = P(X=xmid Theta=theta)$
$endgroup$
– Graham Kemp
Jun 26 '15 at 2:51

1

$begingroup$
@PP121 No; it is literal, at least for discrete random variables. $X$ being a discrete random variable means that, on inspection, it will be found to have one of the values within the sample space with a certain probability. For continuous random variables the appropriate measure is a probability density and things are somewhat more involved, but mostly the same principles apply.
$endgroup$
– Graham Kemp
Jun 26 '15 at 3:56

|
show 1 more comment

$begingroup$
By $p(x,y)$ do you just mean $p(x wedge y)$ (i.e., the probability of $x$ and $y$ co-occurring)?
$endgroup$
– PP121
Jun 26 '15 at 2:42

1

$begingroup$
@PP121 Yes. It's an abbreviation for the joint probability. More specifically $p_{X,Y}(x,y) = mathsf P(X=x cap Y=y)$.
$endgroup$
– Graham Kemp
Jun 26 '15 at 2:44

$begingroup$
So then what is analogous to $X$ and $Y$ in the original example I wrote out? Is it $X = {ldots x ldots}$ and $Theta = {ldots theta ldots }$?
$endgroup$
– PP121
Jun 26 '15 at 2:49

1

$begingroup$
Yes. @PP121 That would be so. $p(xmid theta) = P(X=xmid Theta=theta)$
$endgroup$
– Graham Kemp
Jun 26 '15 at 2:51

1

$begingroup$
@PP121 No; it is literal, at least for discrete random variables. $X$ being a discrete random variable means that, on inspection, it will be found to have one of the values within the sample space with a certain probability. For continuous random variables the appropriate measure is a probability density and things are somewhat more involved, but mostly the same principles apply.
$endgroup$
– Graham Kemp
Jun 26 '15 at 3:56

By $p(x,y)$ do you just mean $p(x wedge y)$ (i.e., the probability of $x$ and $y$ co-occurring)?

– PP121
Jun 26 '15 at 2:42

@PP121 Yes. It's an abbreviation for the joint probability. More specifically $p_{X,Y}(x,y) = mathsf P(X=x cap Y=y)$.

– Graham Kemp
Jun 26 '15 at 2:44

So then what is analogous to $X$ and $Y$ in the original example I wrote out? Is it $X = {ldots x ldots}$ and $Theta = {ldots theta ldots }$?

– PP121
Jun 26 '15 at 2:49

Yes. @PP121 That would be so. $p(xmid theta) = P(X=xmid Theta=theta)$

– Graham Kemp
Jun 26 '15 at 2:51

@PP121 No; it is literal, at least for discrete random variables. $X$ being a discrete random variable means that, on inspection, it will be found to have one of the values within the sample space with a certain probability. For continuous random variables the appropriate measure is a probability density and things are somewhat more involved, but mostly the same principles apply.

– Graham Kemp
Jun 26 '15 at 3:56

|
show 1 more comment

draft saved

draft discarded

Thanks for contributing an answer to Mathematics Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

Search This Blog

Ufyukyu