All these theorems say that conditional distributions tend to be more concentrated. Are they all really one...












3












$begingroup$


Consider a probability distribution, which I'll call the "prior distribution", and some functional of that distribution.



Also consider that same functional, but applied to the probability distribution after conditioning on the value of a random variable. Since what we're conditioning on is random, this functional is itself random. I'll call the conditioned distribution the "posterior distribution."



When this functional measures concentration or dispersion of the distribution, we get theorems like these ones:




  • The expected value of the entropy of the posterior distribution is less than or equal to the entropy of the prior distribution


  • The expected value of the variance of the posterior distribution is less than or equal to the variance of the prior distribution


  • The expected value of the Euclidean norm of the posterior distribution is greater than or equal to the Euclidean norm of the prior distribution



(If it's not clear that Euclidean norm measures the concentration of a distribution, consider that $||p||^2 = sum_i p_i^2$, which is the probability of drawing the same element twice.)



Is there one theorem, that has all three of these facts as special cases?










share|cite|improve this question









$endgroup$








  • 2




    $begingroup$
    These seem like consequences of Jensen's inequality, which exploits the convexity of the various functionals you're interested in.
    $endgroup$
    – stochasticboy321
    Jan 29 at 2:43


















3












$begingroup$


Consider a probability distribution, which I'll call the "prior distribution", and some functional of that distribution.



Also consider that same functional, but applied to the probability distribution after conditioning on the value of a random variable. Since what we're conditioning on is random, this functional is itself random. I'll call the conditioned distribution the "posterior distribution."



When this functional measures concentration or dispersion of the distribution, we get theorems like these ones:




  • The expected value of the entropy of the posterior distribution is less than or equal to the entropy of the prior distribution


  • The expected value of the variance of the posterior distribution is less than or equal to the variance of the prior distribution


  • The expected value of the Euclidean norm of the posterior distribution is greater than or equal to the Euclidean norm of the prior distribution



(If it's not clear that Euclidean norm measures the concentration of a distribution, consider that $||p||^2 = sum_i p_i^2$, which is the probability of drawing the same element twice.)



Is there one theorem, that has all three of these facts as special cases?










share|cite|improve this question









$endgroup$








  • 2




    $begingroup$
    These seem like consequences of Jensen's inequality, which exploits the convexity of the various functionals you're interested in.
    $endgroup$
    – stochasticboy321
    Jan 29 at 2:43
















3












3








3





$begingroup$


Consider a probability distribution, which I'll call the "prior distribution", and some functional of that distribution.



Also consider that same functional, but applied to the probability distribution after conditioning on the value of a random variable. Since what we're conditioning on is random, this functional is itself random. I'll call the conditioned distribution the "posterior distribution."



When this functional measures concentration or dispersion of the distribution, we get theorems like these ones:




  • The expected value of the entropy of the posterior distribution is less than or equal to the entropy of the prior distribution


  • The expected value of the variance of the posterior distribution is less than or equal to the variance of the prior distribution


  • The expected value of the Euclidean norm of the posterior distribution is greater than or equal to the Euclidean norm of the prior distribution



(If it's not clear that Euclidean norm measures the concentration of a distribution, consider that $||p||^2 = sum_i p_i^2$, which is the probability of drawing the same element twice.)



Is there one theorem, that has all three of these facts as special cases?










share|cite|improve this question









$endgroup$




Consider a probability distribution, which I'll call the "prior distribution", and some functional of that distribution.



Also consider that same functional, but applied to the probability distribution after conditioning on the value of a random variable. Since what we're conditioning on is random, this functional is itself random. I'll call the conditioned distribution the "posterior distribution."



When this functional measures concentration or dispersion of the distribution, we get theorems like these ones:




  • The expected value of the entropy of the posterior distribution is less than or equal to the entropy of the prior distribution


  • The expected value of the variance of the posterior distribution is less than or equal to the variance of the prior distribution


  • The expected value of the Euclidean norm of the posterior distribution is greater than or equal to the Euclidean norm of the prior distribution



(If it's not clear that Euclidean norm measures the concentration of a distribution, consider that $||p||^2 = sum_i p_i^2$, which is the probability of drawing the same element twice.)



Is there one theorem, that has all three of these facts as special cases?







probability probability-theory information-theory conditional-probability bayesian






share|cite|improve this question













share|cite|improve this question











share|cite|improve this question




share|cite|improve this question










asked Jan 29 at 2:15









user54038user54038

1529




1529








  • 2




    $begingroup$
    These seem like consequences of Jensen's inequality, which exploits the convexity of the various functionals you're interested in.
    $endgroup$
    – stochasticboy321
    Jan 29 at 2:43
















  • 2




    $begingroup$
    These seem like consequences of Jensen's inequality, which exploits the convexity of the various functionals you're interested in.
    $endgroup$
    – stochasticboy321
    Jan 29 at 2:43










2




2




$begingroup$
These seem like consequences of Jensen's inequality, which exploits the convexity of the various functionals you're interested in.
$endgroup$
– stochasticboy321
Jan 29 at 2:43






$begingroup$
These seem like consequences of Jensen's inequality, which exploits the convexity of the various functionals you're interested in.
$endgroup$
– stochasticboy321
Jan 29 at 2:43












1 Answer
1






active

oldest

votes


















1












$begingroup$

@stochasticboy321 basically answered this in the comments, so I'll just expand on it a bit.



I'll assume our distributions are discrete, and can thus be represented as vectors. Let $vec{Q}$ be the posterior distribution. I'm capitalizing it because it's a random variable, since it depends on the evidence we will receive, which is random. Let $vec{p}$ be the prior distribution. In all the cases above we have:



$$E_vec{p}[C(vec{Q})] ge C(vec{p})$$



where $C$ is some measure of concentration. In my three examples, it would be negative entropy, negative variance, and Euclidean norm, respectively. $E_{vec{p}}$ represents expectation taken with respect to the distribution $vec{p}$.



By the law of total probability,



$$vec{p} = E_vec{p}[Q]$$



Substituting into the first formula,



$$E_vec{p}[C(vec{Q})] ge C(E_vec{p}[Q])$$



Now it is clearly Jensen's inequality, and thus must hold as long as $C$ is convex. Convexity of $C$ is easily verified in each of my examples. For instance, in my third example, the case of the Euclidean norm, convexity is a quick implication of the triangle inequality:



$$||tvec{p}_1 + (1-t)vec{p}_2|| le ||t vec{p}_1|| + ||(1-t)vec{p}_2|| = t||vec{p}_1|| + (1-t)||vec{p}_2||$$



Interesting that I started out by phrasing this as "conditional distributions tend to be more concentrated." When really, it would be more appropriate to say conditional distributions tend to have higher values of any convex functional, and I'm not ready to say that a functional measures concentration if and only if it's convex.






share|cite|improve this answer









$endgroup$














    Your Answer





    StackExchange.ifUsing("editor", function () {
    return StackExchange.using("mathjaxEditing", function () {
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    });
    });
    }, "mathjax-editing");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "69"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    noCode: true, onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3091649%2fall-these-theorems-say-that-conditional-distributions-tend-to-be-more-concentrat%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1












    $begingroup$

    @stochasticboy321 basically answered this in the comments, so I'll just expand on it a bit.



    I'll assume our distributions are discrete, and can thus be represented as vectors. Let $vec{Q}$ be the posterior distribution. I'm capitalizing it because it's a random variable, since it depends on the evidence we will receive, which is random. Let $vec{p}$ be the prior distribution. In all the cases above we have:



    $$E_vec{p}[C(vec{Q})] ge C(vec{p})$$



    where $C$ is some measure of concentration. In my three examples, it would be negative entropy, negative variance, and Euclidean norm, respectively. $E_{vec{p}}$ represents expectation taken with respect to the distribution $vec{p}$.



    By the law of total probability,



    $$vec{p} = E_vec{p}[Q]$$



    Substituting into the first formula,



    $$E_vec{p}[C(vec{Q})] ge C(E_vec{p}[Q])$$



    Now it is clearly Jensen's inequality, and thus must hold as long as $C$ is convex. Convexity of $C$ is easily verified in each of my examples. For instance, in my third example, the case of the Euclidean norm, convexity is a quick implication of the triangle inequality:



    $$||tvec{p}_1 + (1-t)vec{p}_2|| le ||t vec{p}_1|| + ||(1-t)vec{p}_2|| = t||vec{p}_1|| + (1-t)||vec{p}_2||$$



    Interesting that I started out by phrasing this as "conditional distributions tend to be more concentrated." When really, it would be more appropriate to say conditional distributions tend to have higher values of any convex functional, and I'm not ready to say that a functional measures concentration if and only if it's convex.






    share|cite|improve this answer









    $endgroup$


















      1












      $begingroup$

      @stochasticboy321 basically answered this in the comments, so I'll just expand on it a bit.



      I'll assume our distributions are discrete, and can thus be represented as vectors. Let $vec{Q}$ be the posterior distribution. I'm capitalizing it because it's a random variable, since it depends on the evidence we will receive, which is random. Let $vec{p}$ be the prior distribution. In all the cases above we have:



      $$E_vec{p}[C(vec{Q})] ge C(vec{p})$$



      where $C$ is some measure of concentration. In my three examples, it would be negative entropy, negative variance, and Euclidean norm, respectively. $E_{vec{p}}$ represents expectation taken with respect to the distribution $vec{p}$.



      By the law of total probability,



      $$vec{p} = E_vec{p}[Q]$$



      Substituting into the first formula,



      $$E_vec{p}[C(vec{Q})] ge C(E_vec{p}[Q])$$



      Now it is clearly Jensen's inequality, and thus must hold as long as $C$ is convex. Convexity of $C$ is easily verified in each of my examples. For instance, in my third example, the case of the Euclidean norm, convexity is a quick implication of the triangle inequality:



      $$||tvec{p}_1 + (1-t)vec{p}_2|| le ||t vec{p}_1|| + ||(1-t)vec{p}_2|| = t||vec{p}_1|| + (1-t)||vec{p}_2||$$



      Interesting that I started out by phrasing this as "conditional distributions tend to be more concentrated." When really, it would be more appropriate to say conditional distributions tend to have higher values of any convex functional, and I'm not ready to say that a functional measures concentration if and only if it's convex.






      share|cite|improve this answer









      $endgroup$
















        1












        1








        1





        $begingroup$

        @stochasticboy321 basically answered this in the comments, so I'll just expand on it a bit.



        I'll assume our distributions are discrete, and can thus be represented as vectors. Let $vec{Q}$ be the posterior distribution. I'm capitalizing it because it's a random variable, since it depends on the evidence we will receive, which is random. Let $vec{p}$ be the prior distribution. In all the cases above we have:



        $$E_vec{p}[C(vec{Q})] ge C(vec{p})$$



        where $C$ is some measure of concentration. In my three examples, it would be negative entropy, negative variance, and Euclidean norm, respectively. $E_{vec{p}}$ represents expectation taken with respect to the distribution $vec{p}$.



        By the law of total probability,



        $$vec{p} = E_vec{p}[Q]$$



        Substituting into the first formula,



        $$E_vec{p}[C(vec{Q})] ge C(E_vec{p}[Q])$$



        Now it is clearly Jensen's inequality, and thus must hold as long as $C$ is convex. Convexity of $C$ is easily verified in each of my examples. For instance, in my third example, the case of the Euclidean norm, convexity is a quick implication of the triangle inequality:



        $$||tvec{p}_1 + (1-t)vec{p}_2|| le ||t vec{p}_1|| + ||(1-t)vec{p}_2|| = t||vec{p}_1|| + (1-t)||vec{p}_2||$$



        Interesting that I started out by phrasing this as "conditional distributions tend to be more concentrated." When really, it would be more appropriate to say conditional distributions tend to have higher values of any convex functional, and I'm not ready to say that a functional measures concentration if and only if it's convex.






        share|cite|improve this answer









        $endgroup$



        @stochasticboy321 basically answered this in the comments, so I'll just expand on it a bit.



        I'll assume our distributions are discrete, and can thus be represented as vectors. Let $vec{Q}$ be the posterior distribution. I'm capitalizing it because it's a random variable, since it depends on the evidence we will receive, which is random. Let $vec{p}$ be the prior distribution. In all the cases above we have:



        $$E_vec{p}[C(vec{Q})] ge C(vec{p})$$



        where $C$ is some measure of concentration. In my three examples, it would be negative entropy, negative variance, and Euclidean norm, respectively. $E_{vec{p}}$ represents expectation taken with respect to the distribution $vec{p}$.



        By the law of total probability,



        $$vec{p} = E_vec{p}[Q]$$



        Substituting into the first formula,



        $$E_vec{p}[C(vec{Q})] ge C(E_vec{p}[Q])$$



        Now it is clearly Jensen's inequality, and thus must hold as long as $C$ is convex. Convexity of $C$ is easily verified in each of my examples. For instance, in my third example, the case of the Euclidean norm, convexity is a quick implication of the triangle inequality:



        $$||tvec{p}_1 + (1-t)vec{p}_2|| le ||t vec{p}_1|| + ||(1-t)vec{p}_2|| = t||vec{p}_1|| + (1-t)||vec{p}_2||$$



        Interesting that I started out by phrasing this as "conditional distributions tend to be more concentrated." When really, it would be more appropriate to say conditional distributions tend to have higher values of any convex functional, and I'm not ready to say that a functional measures concentration if and only if it's convex.







        share|cite|improve this answer












        share|cite|improve this answer



        share|cite|improve this answer










        answered Feb 8 at 3:31









        user54038user54038

        1529




        1529






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Mathematics Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3091649%2fall-these-theorems-say-that-conditional-distributions-tend-to-be-more-concentrat%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            MongoDB - Not Authorized To Execute Command

            How to fix TextFormField cause rebuild widget in Flutter

            in spring boot 2.1 many test slices are not allowed anymore due to multiple @BootstrapWith