All these theorems say that conditional distributions tend to be more concentrated. Are they all really one...
$begingroup$
Consider a probability distribution, which I'll call the "prior distribution", and some functional of that distribution.
Also consider that same functional, but applied to the probability distribution after conditioning on the value of a random variable. Since what we're conditioning on is random, this functional is itself random. I'll call the conditioned distribution the "posterior distribution."
When this functional measures concentration or dispersion of the distribution, we get theorems like these ones:
The expected value of the entropy of the posterior distribution is less than or equal to the entropy of the prior distribution
The expected value of the variance of the posterior distribution is less than or equal to the variance of the prior distribution
The expected value of the Euclidean norm of the posterior distribution is greater than or equal to the Euclidean norm of the prior distribution
(If it's not clear that Euclidean norm measures the concentration of a distribution, consider that $||p||^2 = sum_i p_i^2$, which is the probability of drawing the same element twice.)
Is there one theorem, that has all three of these facts as special cases?
probability probability-theory information-theory conditional-probability bayesian
$endgroup$
add a comment |
$begingroup$
Consider a probability distribution, which I'll call the "prior distribution", and some functional of that distribution.
Also consider that same functional, but applied to the probability distribution after conditioning on the value of a random variable. Since what we're conditioning on is random, this functional is itself random. I'll call the conditioned distribution the "posterior distribution."
When this functional measures concentration or dispersion of the distribution, we get theorems like these ones:
The expected value of the entropy of the posterior distribution is less than or equal to the entropy of the prior distribution
The expected value of the variance of the posterior distribution is less than or equal to the variance of the prior distribution
The expected value of the Euclidean norm of the posterior distribution is greater than or equal to the Euclidean norm of the prior distribution
(If it's not clear that Euclidean norm measures the concentration of a distribution, consider that $||p||^2 = sum_i p_i^2$, which is the probability of drawing the same element twice.)
Is there one theorem, that has all three of these facts as special cases?
probability probability-theory information-theory conditional-probability bayesian
$endgroup$
2
$begingroup$
These seem like consequences of Jensen's inequality, which exploits the convexity of the various functionals you're interested in.
$endgroup$
– stochasticboy321
Jan 29 at 2:43
add a comment |
$begingroup$
Consider a probability distribution, which I'll call the "prior distribution", and some functional of that distribution.
Also consider that same functional, but applied to the probability distribution after conditioning on the value of a random variable. Since what we're conditioning on is random, this functional is itself random. I'll call the conditioned distribution the "posterior distribution."
When this functional measures concentration or dispersion of the distribution, we get theorems like these ones:
The expected value of the entropy of the posterior distribution is less than or equal to the entropy of the prior distribution
The expected value of the variance of the posterior distribution is less than or equal to the variance of the prior distribution
The expected value of the Euclidean norm of the posterior distribution is greater than or equal to the Euclidean norm of the prior distribution
(If it's not clear that Euclidean norm measures the concentration of a distribution, consider that $||p||^2 = sum_i p_i^2$, which is the probability of drawing the same element twice.)
Is there one theorem, that has all three of these facts as special cases?
probability probability-theory information-theory conditional-probability bayesian
$endgroup$
Consider a probability distribution, which I'll call the "prior distribution", and some functional of that distribution.
Also consider that same functional, but applied to the probability distribution after conditioning on the value of a random variable. Since what we're conditioning on is random, this functional is itself random. I'll call the conditioned distribution the "posterior distribution."
When this functional measures concentration or dispersion of the distribution, we get theorems like these ones:
The expected value of the entropy of the posterior distribution is less than or equal to the entropy of the prior distribution
The expected value of the variance of the posterior distribution is less than or equal to the variance of the prior distribution
The expected value of the Euclidean norm of the posterior distribution is greater than or equal to the Euclidean norm of the prior distribution
(If it's not clear that Euclidean norm measures the concentration of a distribution, consider that $||p||^2 = sum_i p_i^2$, which is the probability of drawing the same element twice.)
Is there one theorem, that has all three of these facts as special cases?
probability probability-theory information-theory conditional-probability bayesian
probability probability-theory information-theory conditional-probability bayesian
asked Jan 29 at 2:15
user54038user54038
1529
1529
2
$begingroup$
These seem like consequences of Jensen's inequality, which exploits the convexity of the various functionals you're interested in.
$endgroup$
– stochasticboy321
Jan 29 at 2:43
add a comment |
2
$begingroup$
These seem like consequences of Jensen's inequality, which exploits the convexity of the various functionals you're interested in.
$endgroup$
– stochasticboy321
Jan 29 at 2:43
2
2
$begingroup$
These seem like consequences of Jensen's inequality, which exploits the convexity of the various functionals you're interested in.
$endgroup$
– stochasticboy321
Jan 29 at 2:43
$begingroup$
These seem like consequences of Jensen's inequality, which exploits the convexity of the various functionals you're interested in.
$endgroup$
– stochasticboy321
Jan 29 at 2:43
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
@stochasticboy321 basically answered this in the comments, so I'll just expand on it a bit.
I'll assume our distributions are discrete, and can thus be represented as vectors. Let $vec{Q}$ be the posterior distribution. I'm capitalizing it because it's a random variable, since it depends on the evidence we will receive, which is random. Let $vec{p}$ be the prior distribution. In all the cases above we have:
$$E_vec{p}[C(vec{Q})] ge C(vec{p})$$
where $C$ is some measure of concentration. In my three examples, it would be negative entropy, negative variance, and Euclidean norm, respectively. $E_{vec{p}}$ represents expectation taken with respect to the distribution $vec{p}$.
By the law of total probability,
$$vec{p} = E_vec{p}[Q]$$
Substituting into the first formula,
$$E_vec{p}[C(vec{Q})] ge C(E_vec{p}[Q])$$
Now it is clearly Jensen's inequality, and thus must hold as long as $C$ is convex. Convexity of $C$ is easily verified in each of my examples. For instance, in my third example, the case of the Euclidean norm, convexity is a quick implication of the triangle inequality:
$$||tvec{p}_1 + (1-t)vec{p}_2|| le ||t vec{p}_1|| + ||(1-t)vec{p}_2|| = t||vec{p}_1|| + (1-t)||vec{p}_2||$$
Interesting that I started out by phrasing this as "conditional distributions tend to be more concentrated." When really, it would be more appropriate to say conditional distributions tend to have higher values of any convex functional, and I'm not ready to say that a functional measures concentration if and only if it's convex.
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3091649%2fall-these-theorems-say-that-conditional-distributions-tend-to-be-more-concentrat%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
@stochasticboy321 basically answered this in the comments, so I'll just expand on it a bit.
I'll assume our distributions are discrete, and can thus be represented as vectors. Let $vec{Q}$ be the posterior distribution. I'm capitalizing it because it's a random variable, since it depends on the evidence we will receive, which is random. Let $vec{p}$ be the prior distribution. In all the cases above we have:
$$E_vec{p}[C(vec{Q})] ge C(vec{p})$$
where $C$ is some measure of concentration. In my three examples, it would be negative entropy, negative variance, and Euclidean norm, respectively. $E_{vec{p}}$ represents expectation taken with respect to the distribution $vec{p}$.
By the law of total probability,
$$vec{p} = E_vec{p}[Q]$$
Substituting into the first formula,
$$E_vec{p}[C(vec{Q})] ge C(E_vec{p}[Q])$$
Now it is clearly Jensen's inequality, and thus must hold as long as $C$ is convex. Convexity of $C$ is easily verified in each of my examples. For instance, in my third example, the case of the Euclidean norm, convexity is a quick implication of the triangle inequality:
$$||tvec{p}_1 + (1-t)vec{p}_2|| le ||t vec{p}_1|| + ||(1-t)vec{p}_2|| = t||vec{p}_1|| + (1-t)||vec{p}_2||$$
Interesting that I started out by phrasing this as "conditional distributions tend to be more concentrated." When really, it would be more appropriate to say conditional distributions tend to have higher values of any convex functional, and I'm not ready to say that a functional measures concentration if and only if it's convex.
$endgroup$
add a comment |
$begingroup$
@stochasticboy321 basically answered this in the comments, so I'll just expand on it a bit.
I'll assume our distributions are discrete, and can thus be represented as vectors. Let $vec{Q}$ be the posterior distribution. I'm capitalizing it because it's a random variable, since it depends on the evidence we will receive, which is random. Let $vec{p}$ be the prior distribution. In all the cases above we have:
$$E_vec{p}[C(vec{Q})] ge C(vec{p})$$
where $C$ is some measure of concentration. In my three examples, it would be negative entropy, negative variance, and Euclidean norm, respectively. $E_{vec{p}}$ represents expectation taken with respect to the distribution $vec{p}$.
By the law of total probability,
$$vec{p} = E_vec{p}[Q]$$
Substituting into the first formula,
$$E_vec{p}[C(vec{Q})] ge C(E_vec{p}[Q])$$
Now it is clearly Jensen's inequality, and thus must hold as long as $C$ is convex. Convexity of $C$ is easily verified in each of my examples. For instance, in my third example, the case of the Euclidean norm, convexity is a quick implication of the triangle inequality:
$$||tvec{p}_1 + (1-t)vec{p}_2|| le ||t vec{p}_1|| + ||(1-t)vec{p}_2|| = t||vec{p}_1|| + (1-t)||vec{p}_2||$$
Interesting that I started out by phrasing this as "conditional distributions tend to be more concentrated." When really, it would be more appropriate to say conditional distributions tend to have higher values of any convex functional, and I'm not ready to say that a functional measures concentration if and only if it's convex.
$endgroup$
add a comment |
$begingroup$
@stochasticboy321 basically answered this in the comments, so I'll just expand on it a bit.
I'll assume our distributions are discrete, and can thus be represented as vectors. Let $vec{Q}$ be the posterior distribution. I'm capitalizing it because it's a random variable, since it depends on the evidence we will receive, which is random. Let $vec{p}$ be the prior distribution. In all the cases above we have:
$$E_vec{p}[C(vec{Q})] ge C(vec{p})$$
where $C$ is some measure of concentration. In my three examples, it would be negative entropy, negative variance, and Euclidean norm, respectively. $E_{vec{p}}$ represents expectation taken with respect to the distribution $vec{p}$.
By the law of total probability,
$$vec{p} = E_vec{p}[Q]$$
Substituting into the first formula,
$$E_vec{p}[C(vec{Q})] ge C(E_vec{p}[Q])$$
Now it is clearly Jensen's inequality, and thus must hold as long as $C$ is convex. Convexity of $C$ is easily verified in each of my examples. For instance, in my third example, the case of the Euclidean norm, convexity is a quick implication of the triangle inequality:
$$||tvec{p}_1 + (1-t)vec{p}_2|| le ||t vec{p}_1|| + ||(1-t)vec{p}_2|| = t||vec{p}_1|| + (1-t)||vec{p}_2||$$
Interesting that I started out by phrasing this as "conditional distributions tend to be more concentrated." When really, it would be more appropriate to say conditional distributions tend to have higher values of any convex functional, and I'm not ready to say that a functional measures concentration if and only if it's convex.
$endgroup$
@stochasticboy321 basically answered this in the comments, so I'll just expand on it a bit.
I'll assume our distributions are discrete, and can thus be represented as vectors. Let $vec{Q}$ be the posterior distribution. I'm capitalizing it because it's a random variable, since it depends on the evidence we will receive, which is random. Let $vec{p}$ be the prior distribution. In all the cases above we have:
$$E_vec{p}[C(vec{Q})] ge C(vec{p})$$
where $C$ is some measure of concentration. In my three examples, it would be negative entropy, negative variance, and Euclidean norm, respectively. $E_{vec{p}}$ represents expectation taken with respect to the distribution $vec{p}$.
By the law of total probability,
$$vec{p} = E_vec{p}[Q]$$
Substituting into the first formula,
$$E_vec{p}[C(vec{Q})] ge C(E_vec{p}[Q])$$
Now it is clearly Jensen's inequality, and thus must hold as long as $C$ is convex. Convexity of $C$ is easily verified in each of my examples. For instance, in my third example, the case of the Euclidean norm, convexity is a quick implication of the triangle inequality:
$$||tvec{p}_1 + (1-t)vec{p}_2|| le ||t vec{p}_1|| + ||(1-t)vec{p}_2|| = t||vec{p}_1|| + (1-t)||vec{p}_2||$$
Interesting that I started out by phrasing this as "conditional distributions tend to be more concentrated." When really, it would be more appropriate to say conditional distributions tend to have higher values of any convex functional, and I'm not ready to say that a functional measures concentration if and only if it's convex.
answered Feb 8 at 3:31
user54038user54038
1529
1529
add a comment |
add a comment |
Thanks for contributing an answer to Mathematics Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3091649%2fall-these-theorems-say-that-conditional-distributions-tend-to-be-more-concentrat%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
$begingroup$
These seem like consequences of Jensen's inequality, which exploits the convexity of the various functionals you're interested in.
$endgroup$
– stochasticboy321
Jan 29 at 2:43