Schwarz inequality in linear algebra and probability theory












1












$begingroup$


Linear algebra states Schwarz inequality as
$$lvertmathbf x^mathrm Tmathbf yrvertlelVertmathbf xrVertlVertmathbf yrVerttag 1$$
However, probability theory states it as
$$(mathbf E[XY])^2lemathbf E[X^2]mathbf E[Y^2]tag 2$$
By comparing $lvertsum_i x_iy_irvertlesqrt{sum_i x_i^2sum_i y_i^2}$ with $lvertsum_ysum_x xyp_{X,Y}(x,y)rvertlesqrt{sum_x x^2p_X(x)sum_y y^2p_Y(y)}$, we see that $(1)$ and $(2)$ are equivalent when $p_{X,Y}(x,y)=begin{cases}frac1n&text{if $x=x_i$ and $y=y_i$ for $iin{1,2,cdots,n}$}\0&text{otherwise}end{cases}$. Thus, $(2)$ can be thought of as a more general form of the inequality.



Another way to think about this is to compare $lvertcosthetarvert=frac{lvertmathbf x^mathrm Tmathbf yrvert}{lVertmathbf xrVertlVertmathbf yrVert}le1$ with $lvertrhorvert=frac{lvertmathbf{cov}(X,Y)rvert}{sqrt{mathbf{var}(X)mathbf{var}(Y)}}le1$. The former is exactly $(1)$, while the latter becomes $(2)$ only when $mathbf E[X]=mathbf E[Y]=0$. In some sense, we can view $mathbf x^mathrm Tmathbf y$ as a special form of $mathbf{cov}(X,Y)$. Then, it follows that $mathbf x^mathrm Tmathbf x$ is a form of $mathbf{var}(X)$ and $lVertmathbf xrVert$ is a form of $sqrt{mathbf{var}(X)}$.



What is the special form of $mathbf E[X]$ and how do we understand $mathbf E[X]=mathbf E[Y]=0$ in linear algebra? With $p_{X,Y}$ defined above, we have $mathbf E[XY]=frac{mathbf x^mathrm Tmathbf y}n$, but $mathbf{cov}(X,Y)nemathbf E[XY]$ unless $mathbf E[X]=0$ or $mathbf E[Y]=0$. How can we obtain a relation between $mathbf{cov}(X,Y)$ and $mathbf x^mathrm Tmathbf y$?










share|cite|improve this question











$endgroup$








  • 1




    $begingroup$
    Not "the same as", rather "a particular case of" (can you spot how?).
    $endgroup$
    – Did
    Jan 13 at 13:01












  • $begingroup$
    @Did The two inequalities are equivalent when $p_{X,Y}(x,y)= begin{cases} frac1n&text{ if $x=x_i$ and $y=y_i$ for $iin{1,2,cdots,n}$}\ 0&text{ otherwise} end{cases}$!
    $endgroup$
    – W. Zhu
    Jan 14 at 2:59










  • $begingroup$
    Thus, question solved?
    $endgroup$
    – Did
    Jan 14 at 11:26










  • $begingroup$
    @Did I have one more question. If we write $mathbf{cov}(X, Y)$ as $mathbf x^mathrm Tmathbf y$, then $lvertrhorvertle1$ becomes $lvertcosthetarvertle1$. But we need to set $mathbf E[X]=mathbf E[Y]=0$, which means that the components of each of $mathbf x$ and $mathbf y$ average to zero. Shouldn't the inequality hold for all vectors $mathbf x$ and $mathbf y$?
    $endgroup$
    – W. Zhu
    Jan 14 at 15:07






  • 1




    $begingroup$
    I don't understand the downvote, as is often the case when there's no comment accompanying it. Anyway, there's a recent question on the covariance which addresses exactly the doubts of this post.
    $endgroup$
    – Giuseppe Negro
    Jan 15 at 13:32


















1












$begingroup$


Linear algebra states Schwarz inequality as
$$lvertmathbf x^mathrm Tmathbf yrvertlelVertmathbf xrVertlVertmathbf yrVerttag 1$$
However, probability theory states it as
$$(mathbf E[XY])^2lemathbf E[X^2]mathbf E[Y^2]tag 2$$
By comparing $lvertsum_i x_iy_irvertlesqrt{sum_i x_i^2sum_i y_i^2}$ with $lvertsum_ysum_x xyp_{X,Y}(x,y)rvertlesqrt{sum_x x^2p_X(x)sum_y y^2p_Y(y)}$, we see that $(1)$ and $(2)$ are equivalent when $p_{X,Y}(x,y)=begin{cases}frac1n&text{if $x=x_i$ and $y=y_i$ for $iin{1,2,cdots,n}$}\0&text{otherwise}end{cases}$. Thus, $(2)$ can be thought of as a more general form of the inequality.



Another way to think about this is to compare $lvertcosthetarvert=frac{lvertmathbf x^mathrm Tmathbf yrvert}{lVertmathbf xrVertlVertmathbf yrVert}le1$ with $lvertrhorvert=frac{lvertmathbf{cov}(X,Y)rvert}{sqrt{mathbf{var}(X)mathbf{var}(Y)}}le1$. The former is exactly $(1)$, while the latter becomes $(2)$ only when $mathbf E[X]=mathbf E[Y]=0$. In some sense, we can view $mathbf x^mathrm Tmathbf y$ as a special form of $mathbf{cov}(X,Y)$. Then, it follows that $mathbf x^mathrm Tmathbf x$ is a form of $mathbf{var}(X)$ and $lVertmathbf xrVert$ is a form of $sqrt{mathbf{var}(X)}$.



What is the special form of $mathbf E[X]$ and how do we understand $mathbf E[X]=mathbf E[Y]=0$ in linear algebra? With $p_{X,Y}$ defined above, we have $mathbf E[XY]=frac{mathbf x^mathrm Tmathbf y}n$, but $mathbf{cov}(X,Y)nemathbf E[XY]$ unless $mathbf E[X]=0$ or $mathbf E[Y]=0$. How can we obtain a relation between $mathbf{cov}(X,Y)$ and $mathbf x^mathrm Tmathbf y$?










share|cite|improve this question











$endgroup$








  • 1




    $begingroup$
    Not "the same as", rather "a particular case of" (can you spot how?).
    $endgroup$
    – Did
    Jan 13 at 13:01












  • $begingroup$
    @Did The two inequalities are equivalent when $p_{X,Y}(x,y)= begin{cases} frac1n&text{ if $x=x_i$ and $y=y_i$ for $iin{1,2,cdots,n}$}\ 0&text{ otherwise} end{cases}$!
    $endgroup$
    – W. Zhu
    Jan 14 at 2:59










  • $begingroup$
    Thus, question solved?
    $endgroup$
    – Did
    Jan 14 at 11:26










  • $begingroup$
    @Did I have one more question. If we write $mathbf{cov}(X, Y)$ as $mathbf x^mathrm Tmathbf y$, then $lvertrhorvertle1$ becomes $lvertcosthetarvertle1$. But we need to set $mathbf E[X]=mathbf E[Y]=0$, which means that the components of each of $mathbf x$ and $mathbf y$ average to zero. Shouldn't the inequality hold for all vectors $mathbf x$ and $mathbf y$?
    $endgroup$
    – W. Zhu
    Jan 14 at 15:07






  • 1




    $begingroup$
    I don't understand the downvote, as is often the case when there's no comment accompanying it. Anyway, there's a recent question on the covariance which addresses exactly the doubts of this post.
    $endgroup$
    – Giuseppe Negro
    Jan 15 at 13:32
















1












1








1





$begingroup$


Linear algebra states Schwarz inequality as
$$lvertmathbf x^mathrm Tmathbf yrvertlelVertmathbf xrVertlVertmathbf yrVerttag 1$$
However, probability theory states it as
$$(mathbf E[XY])^2lemathbf E[X^2]mathbf E[Y^2]tag 2$$
By comparing $lvertsum_i x_iy_irvertlesqrt{sum_i x_i^2sum_i y_i^2}$ with $lvertsum_ysum_x xyp_{X,Y}(x,y)rvertlesqrt{sum_x x^2p_X(x)sum_y y^2p_Y(y)}$, we see that $(1)$ and $(2)$ are equivalent when $p_{X,Y}(x,y)=begin{cases}frac1n&text{if $x=x_i$ and $y=y_i$ for $iin{1,2,cdots,n}$}\0&text{otherwise}end{cases}$. Thus, $(2)$ can be thought of as a more general form of the inequality.



Another way to think about this is to compare $lvertcosthetarvert=frac{lvertmathbf x^mathrm Tmathbf yrvert}{lVertmathbf xrVertlVertmathbf yrVert}le1$ with $lvertrhorvert=frac{lvertmathbf{cov}(X,Y)rvert}{sqrt{mathbf{var}(X)mathbf{var}(Y)}}le1$. The former is exactly $(1)$, while the latter becomes $(2)$ only when $mathbf E[X]=mathbf E[Y]=0$. In some sense, we can view $mathbf x^mathrm Tmathbf y$ as a special form of $mathbf{cov}(X,Y)$. Then, it follows that $mathbf x^mathrm Tmathbf x$ is a form of $mathbf{var}(X)$ and $lVertmathbf xrVert$ is a form of $sqrt{mathbf{var}(X)}$.



What is the special form of $mathbf E[X]$ and how do we understand $mathbf E[X]=mathbf E[Y]=0$ in linear algebra? With $p_{X,Y}$ defined above, we have $mathbf E[XY]=frac{mathbf x^mathrm Tmathbf y}n$, but $mathbf{cov}(X,Y)nemathbf E[XY]$ unless $mathbf E[X]=0$ or $mathbf E[Y]=0$. How can we obtain a relation between $mathbf{cov}(X,Y)$ and $mathbf x^mathrm Tmathbf y$?










share|cite|improve this question











$endgroup$




Linear algebra states Schwarz inequality as
$$lvertmathbf x^mathrm Tmathbf yrvertlelVertmathbf xrVertlVertmathbf yrVerttag 1$$
However, probability theory states it as
$$(mathbf E[XY])^2lemathbf E[X^2]mathbf E[Y^2]tag 2$$
By comparing $lvertsum_i x_iy_irvertlesqrt{sum_i x_i^2sum_i y_i^2}$ with $lvertsum_ysum_x xyp_{X,Y}(x,y)rvertlesqrt{sum_x x^2p_X(x)sum_y y^2p_Y(y)}$, we see that $(1)$ and $(2)$ are equivalent when $p_{X,Y}(x,y)=begin{cases}frac1n&text{if $x=x_i$ and $y=y_i$ for $iin{1,2,cdots,n}$}\0&text{otherwise}end{cases}$. Thus, $(2)$ can be thought of as a more general form of the inequality.



Another way to think about this is to compare $lvertcosthetarvert=frac{lvertmathbf x^mathrm Tmathbf yrvert}{lVertmathbf xrVertlVertmathbf yrVert}le1$ with $lvertrhorvert=frac{lvertmathbf{cov}(X,Y)rvert}{sqrt{mathbf{var}(X)mathbf{var}(Y)}}le1$. The former is exactly $(1)$, while the latter becomes $(2)$ only when $mathbf E[X]=mathbf E[Y]=0$. In some sense, we can view $mathbf x^mathrm Tmathbf y$ as a special form of $mathbf{cov}(X,Y)$. Then, it follows that $mathbf x^mathrm Tmathbf x$ is a form of $mathbf{var}(X)$ and $lVertmathbf xrVert$ is a form of $sqrt{mathbf{var}(X)}$.



What is the special form of $mathbf E[X]$ and how do we understand $mathbf E[X]=mathbf E[Y]=0$ in linear algebra? With $p_{X,Y}$ defined above, we have $mathbf E[XY]=frac{mathbf x^mathrm Tmathbf y}n$, but $mathbf{cov}(X,Y)nemathbf E[XY]$ unless $mathbf E[X]=0$ or $mathbf E[Y]=0$. How can we obtain a relation between $mathbf{cov}(X,Y)$ and $mathbf x^mathrm Tmathbf y$?







linear-algebra probability-theory cauchy-schwarz-inequality






share|cite|improve this question















share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited Jan 15 at 7:50







W. Zhu

















asked Jan 13 at 12:33









W. ZhuW. Zhu

685316




685316








  • 1




    $begingroup$
    Not "the same as", rather "a particular case of" (can you spot how?).
    $endgroup$
    – Did
    Jan 13 at 13:01












  • $begingroup$
    @Did The two inequalities are equivalent when $p_{X,Y}(x,y)= begin{cases} frac1n&text{ if $x=x_i$ and $y=y_i$ for $iin{1,2,cdots,n}$}\ 0&text{ otherwise} end{cases}$!
    $endgroup$
    – W. Zhu
    Jan 14 at 2:59










  • $begingroup$
    Thus, question solved?
    $endgroup$
    – Did
    Jan 14 at 11:26










  • $begingroup$
    @Did I have one more question. If we write $mathbf{cov}(X, Y)$ as $mathbf x^mathrm Tmathbf y$, then $lvertrhorvertle1$ becomes $lvertcosthetarvertle1$. But we need to set $mathbf E[X]=mathbf E[Y]=0$, which means that the components of each of $mathbf x$ and $mathbf y$ average to zero. Shouldn't the inequality hold for all vectors $mathbf x$ and $mathbf y$?
    $endgroup$
    – W. Zhu
    Jan 14 at 15:07






  • 1




    $begingroup$
    I don't understand the downvote, as is often the case when there's no comment accompanying it. Anyway, there's a recent question on the covariance which addresses exactly the doubts of this post.
    $endgroup$
    – Giuseppe Negro
    Jan 15 at 13:32
















  • 1




    $begingroup$
    Not "the same as", rather "a particular case of" (can you spot how?).
    $endgroup$
    – Did
    Jan 13 at 13:01












  • $begingroup$
    @Did The two inequalities are equivalent when $p_{X,Y}(x,y)= begin{cases} frac1n&text{ if $x=x_i$ and $y=y_i$ for $iin{1,2,cdots,n}$}\ 0&text{ otherwise} end{cases}$!
    $endgroup$
    – W. Zhu
    Jan 14 at 2:59










  • $begingroup$
    Thus, question solved?
    $endgroup$
    – Did
    Jan 14 at 11:26










  • $begingroup$
    @Did I have one more question. If we write $mathbf{cov}(X, Y)$ as $mathbf x^mathrm Tmathbf y$, then $lvertrhorvertle1$ becomes $lvertcosthetarvertle1$. But we need to set $mathbf E[X]=mathbf E[Y]=0$, which means that the components of each of $mathbf x$ and $mathbf y$ average to zero. Shouldn't the inequality hold for all vectors $mathbf x$ and $mathbf y$?
    $endgroup$
    – W. Zhu
    Jan 14 at 15:07






  • 1




    $begingroup$
    I don't understand the downvote, as is often the case when there's no comment accompanying it. Anyway, there's a recent question on the covariance which addresses exactly the doubts of this post.
    $endgroup$
    – Giuseppe Negro
    Jan 15 at 13:32










1




1




$begingroup$
Not "the same as", rather "a particular case of" (can you spot how?).
$endgroup$
– Did
Jan 13 at 13:01






$begingroup$
Not "the same as", rather "a particular case of" (can you spot how?).
$endgroup$
– Did
Jan 13 at 13:01














$begingroup$
@Did The two inequalities are equivalent when $p_{X,Y}(x,y)= begin{cases} frac1n&text{ if $x=x_i$ and $y=y_i$ for $iin{1,2,cdots,n}$}\ 0&text{ otherwise} end{cases}$!
$endgroup$
– W. Zhu
Jan 14 at 2:59




$begingroup$
@Did The two inequalities are equivalent when $p_{X,Y}(x,y)= begin{cases} frac1n&text{ if $x=x_i$ and $y=y_i$ for $iin{1,2,cdots,n}$}\ 0&text{ otherwise} end{cases}$!
$endgroup$
– W. Zhu
Jan 14 at 2:59












$begingroup$
Thus, question solved?
$endgroup$
– Did
Jan 14 at 11:26




$begingroup$
Thus, question solved?
$endgroup$
– Did
Jan 14 at 11:26












$begingroup$
@Did I have one more question. If we write $mathbf{cov}(X, Y)$ as $mathbf x^mathrm Tmathbf y$, then $lvertrhorvertle1$ becomes $lvertcosthetarvertle1$. But we need to set $mathbf E[X]=mathbf E[Y]=0$, which means that the components of each of $mathbf x$ and $mathbf y$ average to zero. Shouldn't the inequality hold for all vectors $mathbf x$ and $mathbf y$?
$endgroup$
– W. Zhu
Jan 14 at 15:07




$begingroup$
@Did I have one more question. If we write $mathbf{cov}(X, Y)$ as $mathbf x^mathrm Tmathbf y$, then $lvertrhorvertle1$ becomes $lvertcosthetarvertle1$. But we need to set $mathbf E[X]=mathbf E[Y]=0$, which means that the components of each of $mathbf x$ and $mathbf y$ average to zero. Shouldn't the inequality hold for all vectors $mathbf x$ and $mathbf y$?
$endgroup$
– W. Zhu
Jan 14 at 15:07




1




1




$begingroup$
I don't understand the downvote, as is often the case when there's no comment accompanying it. Anyway, there's a recent question on the covariance which addresses exactly the doubts of this post.
$endgroup$
– Giuseppe Negro
Jan 15 at 13:32






$begingroup$
I don't understand the downvote, as is often the case when there's no comment accompanying it. Anyway, there's a recent question on the covariance which addresses exactly the doubts of this post.
$endgroup$
– Giuseppe Negro
Jan 15 at 13:32












1 Answer
1






active

oldest

votes


















0












$begingroup$

After reading J.G.'s answer and some thinking, I have arrived at a satisfactory answer. I will post my thoughts below.



Let $mathbf xinBbb R^n$ denote a discrete uniform random variable with each component corresoponding to each outcome. Then $mathbf E[mathbf x]$ is the average of the components, and $mathbf E[mathbf x]=0$ means that the components sum to zero. Thus, for zero-mean random variables, we can choose $n-1$ components and set the last component to $-sum_{i=1}^{n-1}x_i$. These vectors form an $n-1$-dimensional subspace. We can bring any vector to this centered subspace $C$ by subtracting from each component the average of all components.



Now we consider two vectors $mathbf x$ and $mathbf y$ in $C$. We can use a matrix to represent the joint distribution. Put $x_i$'s in the rows and $y_i$'s in the columns, and consider this joint distribution matrix:
$$D=
begin{bmatrix}
frac1n&0&0&cdots&0\
0&frac1n&0&cdots&0\
vdots&vdots&vdots&ddots&vdots\
0&0&0&cdots&frac1n
end{bmatrix}$$



This distribution is special because it puts equal weights on the diagonal entries and zero weight on the off-diagonal entries. We may call this the discrete uniform diagonal joint distribution. It is easily seen that $mathbf x$ and $mathbf y$ are discrete uniform but not independent ($mathbf x$ being $x_i$ forces $mathbf y$ to be $y_i$).



Under these assumptions, $mathbf{cov}(mathbf x, mathbf y)=frac{mathbf x^mathrm Tmathbf y}n$, $mathbf{var}(mathbf x)=frac{mathbf x^mathrm Tmathbf x}n$, $sigma_{mathbf x}=frac{lVertmathbf xrVert}{sqrt n}$ and $rho=frac{mathbf{cov}(mathbf x,mathbf y)}{sigma_{mathbf x}sigma_{mathbf y}}=frac{mathbf x^mathrm Tmathbf y}{lVertmathbf xrVertlVertmathbf yrVert}=costheta$. When $mathbf x$ and $mathbf y$ are orthogonal vectors, they are uncorrelated random variables. Although they are linearly independent vectors, they are not independent random variables.



Now we have a correspondence between covariance and dot product, standard deviation and length, correlation coefficient and the cosine of the angle between two vectors, and uncorrelatedness and orthogonality. Thus, Schwarz inequality $lvertcosthetarvertle1$ matches $lvertrhorvertle1$.



Let us look at 3 more examples that connect linear algebra to probability theory:




  1. The triangle inequality $lVertmathbf x+mathbf
    yrVertlelVertmathbf xrVert+lVertmathbf yrVert$
    matches
    $sigma_{X+Y}lesigma_X+sigma_Y$.


  2. $(mathbf x+mathbf y)^mathrm T(mathbf x+mathbf y)=mathbf x^mathrm Tmathbf x+mathbf y^mathrm Tmathbf y+2mathbf x^mathrm Tmathbf y$ matches $mathbf{var}(X+Y)=mathbf{var}(X)+mathbf{var}(Y)+2mathbf{cov}(X,Y)$.

  3. Pythagoras theorem
    $lVertmathbf brVert^2=lVertmathbf prVert^2+lVertmathbf
    erVert^2$
    with orthogonal projection $mathbf p$ and error $mathbf
    e=mathbf b-mathbf p$
    matches
    $mathbf{var}(Theta)=mathbf{var}(hatTheta)+mathbf{var}(tildeTheta)$,
    with uncorrelated estimator $hatTheta$ and estimation error
    $tildeTheta=Theta-hatTheta$. In fact, this is just the law of
    total variance $mathbf{var}(Theta)=mathbf{var}(mathbf
    E[Theta|X])+mathbf E[mathbf{var}(Theta|X)]$
    with
    $hatTheta=mathbf E[Theta|X]$.






share|cite|improve this answer











$endgroup$













    Your Answer





    StackExchange.ifUsing("editor", function () {
    return StackExchange.using("mathjaxEditing", function () {
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    });
    });
    }, "mathjax-editing");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "69"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    noCode: true, onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3071967%2fschwarz-inequality-in-linear-algebra-and-probability-theory%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0












    $begingroup$

    After reading J.G.'s answer and some thinking, I have arrived at a satisfactory answer. I will post my thoughts below.



    Let $mathbf xinBbb R^n$ denote a discrete uniform random variable with each component corresoponding to each outcome. Then $mathbf E[mathbf x]$ is the average of the components, and $mathbf E[mathbf x]=0$ means that the components sum to zero. Thus, for zero-mean random variables, we can choose $n-1$ components and set the last component to $-sum_{i=1}^{n-1}x_i$. These vectors form an $n-1$-dimensional subspace. We can bring any vector to this centered subspace $C$ by subtracting from each component the average of all components.



    Now we consider two vectors $mathbf x$ and $mathbf y$ in $C$. We can use a matrix to represent the joint distribution. Put $x_i$'s in the rows and $y_i$'s in the columns, and consider this joint distribution matrix:
    $$D=
    begin{bmatrix}
    frac1n&0&0&cdots&0\
    0&frac1n&0&cdots&0\
    vdots&vdots&vdots&ddots&vdots\
    0&0&0&cdots&frac1n
    end{bmatrix}$$



    This distribution is special because it puts equal weights on the diagonal entries and zero weight on the off-diagonal entries. We may call this the discrete uniform diagonal joint distribution. It is easily seen that $mathbf x$ and $mathbf y$ are discrete uniform but not independent ($mathbf x$ being $x_i$ forces $mathbf y$ to be $y_i$).



    Under these assumptions, $mathbf{cov}(mathbf x, mathbf y)=frac{mathbf x^mathrm Tmathbf y}n$, $mathbf{var}(mathbf x)=frac{mathbf x^mathrm Tmathbf x}n$, $sigma_{mathbf x}=frac{lVertmathbf xrVert}{sqrt n}$ and $rho=frac{mathbf{cov}(mathbf x,mathbf y)}{sigma_{mathbf x}sigma_{mathbf y}}=frac{mathbf x^mathrm Tmathbf y}{lVertmathbf xrVertlVertmathbf yrVert}=costheta$. When $mathbf x$ and $mathbf y$ are orthogonal vectors, they are uncorrelated random variables. Although they are linearly independent vectors, they are not independent random variables.



    Now we have a correspondence between covariance and dot product, standard deviation and length, correlation coefficient and the cosine of the angle between two vectors, and uncorrelatedness and orthogonality. Thus, Schwarz inequality $lvertcosthetarvertle1$ matches $lvertrhorvertle1$.



    Let us look at 3 more examples that connect linear algebra to probability theory:




    1. The triangle inequality $lVertmathbf x+mathbf
      yrVertlelVertmathbf xrVert+lVertmathbf yrVert$
      matches
      $sigma_{X+Y}lesigma_X+sigma_Y$.


    2. $(mathbf x+mathbf y)^mathrm T(mathbf x+mathbf y)=mathbf x^mathrm Tmathbf x+mathbf y^mathrm Tmathbf y+2mathbf x^mathrm Tmathbf y$ matches $mathbf{var}(X+Y)=mathbf{var}(X)+mathbf{var}(Y)+2mathbf{cov}(X,Y)$.

    3. Pythagoras theorem
      $lVertmathbf brVert^2=lVertmathbf prVert^2+lVertmathbf
      erVert^2$
      with orthogonal projection $mathbf p$ and error $mathbf
      e=mathbf b-mathbf p$
      matches
      $mathbf{var}(Theta)=mathbf{var}(hatTheta)+mathbf{var}(tildeTheta)$,
      with uncorrelated estimator $hatTheta$ and estimation error
      $tildeTheta=Theta-hatTheta$. In fact, this is just the law of
      total variance $mathbf{var}(Theta)=mathbf{var}(mathbf
      E[Theta|X])+mathbf E[mathbf{var}(Theta|X)]$
      with
      $hatTheta=mathbf E[Theta|X]$.






    share|cite|improve this answer











    $endgroup$


















      0












      $begingroup$

      After reading J.G.'s answer and some thinking, I have arrived at a satisfactory answer. I will post my thoughts below.



      Let $mathbf xinBbb R^n$ denote a discrete uniform random variable with each component corresoponding to each outcome. Then $mathbf E[mathbf x]$ is the average of the components, and $mathbf E[mathbf x]=0$ means that the components sum to zero. Thus, for zero-mean random variables, we can choose $n-1$ components and set the last component to $-sum_{i=1}^{n-1}x_i$. These vectors form an $n-1$-dimensional subspace. We can bring any vector to this centered subspace $C$ by subtracting from each component the average of all components.



      Now we consider two vectors $mathbf x$ and $mathbf y$ in $C$. We can use a matrix to represent the joint distribution. Put $x_i$'s in the rows and $y_i$'s in the columns, and consider this joint distribution matrix:
      $$D=
      begin{bmatrix}
      frac1n&0&0&cdots&0\
      0&frac1n&0&cdots&0\
      vdots&vdots&vdots&ddots&vdots\
      0&0&0&cdots&frac1n
      end{bmatrix}$$



      This distribution is special because it puts equal weights on the diagonal entries and zero weight on the off-diagonal entries. We may call this the discrete uniform diagonal joint distribution. It is easily seen that $mathbf x$ and $mathbf y$ are discrete uniform but not independent ($mathbf x$ being $x_i$ forces $mathbf y$ to be $y_i$).



      Under these assumptions, $mathbf{cov}(mathbf x, mathbf y)=frac{mathbf x^mathrm Tmathbf y}n$, $mathbf{var}(mathbf x)=frac{mathbf x^mathrm Tmathbf x}n$, $sigma_{mathbf x}=frac{lVertmathbf xrVert}{sqrt n}$ and $rho=frac{mathbf{cov}(mathbf x,mathbf y)}{sigma_{mathbf x}sigma_{mathbf y}}=frac{mathbf x^mathrm Tmathbf y}{lVertmathbf xrVertlVertmathbf yrVert}=costheta$. When $mathbf x$ and $mathbf y$ are orthogonal vectors, they are uncorrelated random variables. Although they are linearly independent vectors, they are not independent random variables.



      Now we have a correspondence between covariance and dot product, standard deviation and length, correlation coefficient and the cosine of the angle between two vectors, and uncorrelatedness and orthogonality. Thus, Schwarz inequality $lvertcosthetarvertle1$ matches $lvertrhorvertle1$.



      Let us look at 3 more examples that connect linear algebra to probability theory:




      1. The triangle inequality $lVertmathbf x+mathbf
        yrVertlelVertmathbf xrVert+lVertmathbf yrVert$
        matches
        $sigma_{X+Y}lesigma_X+sigma_Y$.


      2. $(mathbf x+mathbf y)^mathrm T(mathbf x+mathbf y)=mathbf x^mathrm Tmathbf x+mathbf y^mathrm Tmathbf y+2mathbf x^mathrm Tmathbf y$ matches $mathbf{var}(X+Y)=mathbf{var}(X)+mathbf{var}(Y)+2mathbf{cov}(X,Y)$.

      3. Pythagoras theorem
        $lVertmathbf brVert^2=lVertmathbf prVert^2+lVertmathbf
        erVert^2$
        with orthogonal projection $mathbf p$ and error $mathbf
        e=mathbf b-mathbf p$
        matches
        $mathbf{var}(Theta)=mathbf{var}(hatTheta)+mathbf{var}(tildeTheta)$,
        with uncorrelated estimator $hatTheta$ and estimation error
        $tildeTheta=Theta-hatTheta$. In fact, this is just the law of
        total variance $mathbf{var}(Theta)=mathbf{var}(mathbf
        E[Theta|X])+mathbf E[mathbf{var}(Theta|X)]$
        with
        $hatTheta=mathbf E[Theta|X]$.






      share|cite|improve this answer











      $endgroup$
















        0












        0








        0





        $begingroup$

        After reading J.G.'s answer and some thinking, I have arrived at a satisfactory answer. I will post my thoughts below.



        Let $mathbf xinBbb R^n$ denote a discrete uniform random variable with each component corresoponding to each outcome. Then $mathbf E[mathbf x]$ is the average of the components, and $mathbf E[mathbf x]=0$ means that the components sum to zero. Thus, for zero-mean random variables, we can choose $n-1$ components and set the last component to $-sum_{i=1}^{n-1}x_i$. These vectors form an $n-1$-dimensional subspace. We can bring any vector to this centered subspace $C$ by subtracting from each component the average of all components.



        Now we consider two vectors $mathbf x$ and $mathbf y$ in $C$. We can use a matrix to represent the joint distribution. Put $x_i$'s in the rows and $y_i$'s in the columns, and consider this joint distribution matrix:
        $$D=
        begin{bmatrix}
        frac1n&0&0&cdots&0\
        0&frac1n&0&cdots&0\
        vdots&vdots&vdots&ddots&vdots\
        0&0&0&cdots&frac1n
        end{bmatrix}$$



        This distribution is special because it puts equal weights on the diagonal entries and zero weight on the off-diagonal entries. We may call this the discrete uniform diagonal joint distribution. It is easily seen that $mathbf x$ and $mathbf y$ are discrete uniform but not independent ($mathbf x$ being $x_i$ forces $mathbf y$ to be $y_i$).



        Under these assumptions, $mathbf{cov}(mathbf x, mathbf y)=frac{mathbf x^mathrm Tmathbf y}n$, $mathbf{var}(mathbf x)=frac{mathbf x^mathrm Tmathbf x}n$, $sigma_{mathbf x}=frac{lVertmathbf xrVert}{sqrt n}$ and $rho=frac{mathbf{cov}(mathbf x,mathbf y)}{sigma_{mathbf x}sigma_{mathbf y}}=frac{mathbf x^mathrm Tmathbf y}{lVertmathbf xrVertlVertmathbf yrVert}=costheta$. When $mathbf x$ and $mathbf y$ are orthogonal vectors, they are uncorrelated random variables. Although they are linearly independent vectors, they are not independent random variables.



        Now we have a correspondence between covariance and dot product, standard deviation and length, correlation coefficient and the cosine of the angle between two vectors, and uncorrelatedness and orthogonality. Thus, Schwarz inequality $lvertcosthetarvertle1$ matches $lvertrhorvertle1$.



        Let us look at 3 more examples that connect linear algebra to probability theory:




        1. The triangle inequality $lVertmathbf x+mathbf
          yrVertlelVertmathbf xrVert+lVertmathbf yrVert$
          matches
          $sigma_{X+Y}lesigma_X+sigma_Y$.


        2. $(mathbf x+mathbf y)^mathrm T(mathbf x+mathbf y)=mathbf x^mathrm Tmathbf x+mathbf y^mathrm Tmathbf y+2mathbf x^mathrm Tmathbf y$ matches $mathbf{var}(X+Y)=mathbf{var}(X)+mathbf{var}(Y)+2mathbf{cov}(X,Y)$.

        3. Pythagoras theorem
          $lVertmathbf brVert^2=lVertmathbf prVert^2+lVertmathbf
          erVert^2$
          with orthogonal projection $mathbf p$ and error $mathbf
          e=mathbf b-mathbf p$
          matches
          $mathbf{var}(Theta)=mathbf{var}(hatTheta)+mathbf{var}(tildeTheta)$,
          with uncorrelated estimator $hatTheta$ and estimation error
          $tildeTheta=Theta-hatTheta$. In fact, this is just the law of
          total variance $mathbf{var}(Theta)=mathbf{var}(mathbf
          E[Theta|X])+mathbf E[mathbf{var}(Theta|X)]$
          with
          $hatTheta=mathbf E[Theta|X]$.






        share|cite|improve this answer











        $endgroup$



        After reading J.G.'s answer and some thinking, I have arrived at a satisfactory answer. I will post my thoughts below.



        Let $mathbf xinBbb R^n$ denote a discrete uniform random variable with each component corresoponding to each outcome. Then $mathbf E[mathbf x]$ is the average of the components, and $mathbf E[mathbf x]=0$ means that the components sum to zero. Thus, for zero-mean random variables, we can choose $n-1$ components and set the last component to $-sum_{i=1}^{n-1}x_i$. These vectors form an $n-1$-dimensional subspace. We can bring any vector to this centered subspace $C$ by subtracting from each component the average of all components.



        Now we consider two vectors $mathbf x$ and $mathbf y$ in $C$. We can use a matrix to represent the joint distribution. Put $x_i$'s in the rows and $y_i$'s in the columns, and consider this joint distribution matrix:
        $$D=
        begin{bmatrix}
        frac1n&0&0&cdots&0\
        0&frac1n&0&cdots&0\
        vdots&vdots&vdots&ddots&vdots\
        0&0&0&cdots&frac1n
        end{bmatrix}$$



        This distribution is special because it puts equal weights on the diagonal entries and zero weight on the off-diagonal entries. We may call this the discrete uniform diagonal joint distribution. It is easily seen that $mathbf x$ and $mathbf y$ are discrete uniform but not independent ($mathbf x$ being $x_i$ forces $mathbf y$ to be $y_i$).



        Under these assumptions, $mathbf{cov}(mathbf x, mathbf y)=frac{mathbf x^mathrm Tmathbf y}n$, $mathbf{var}(mathbf x)=frac{mathbf x^mathrm Tmathbf x}n$, $sigma_{mathbf x}=frac{lVertmathbf xrVert}{sqrt n}$ and $rho=frac{mathbf{cov}(mathbf x,mathbf y)}{sigma_{mathbf x}sigma_{mathbf y}}=frac{mathbf x^mathrm Tmathbf y}{lVertmathbf xrVertlVertmathbf yrVert}=costheta$. When $mathbf x$ and $mathbf y$ are orthogonal vectors, they are uncorrelated random variables. Although they are linearly independent vectors, they are not independent random variables.



        Now we have a correspondence between covariance and dot product, standard deviation and length, correlation coefficient and the cosine of the angle between two vectors, and uncorrelatedness and orthogonality. Thus, Schwarz inequality $lvertcosthetarvertle1$ matches $lvertrhorvertle1$.



        Let us look at 3 more examples that connect linear algebra to probability theory:




        1. The triangle inequality $lVertmathbf x+mathbf
          yrVertlelVertmathbf xrVert+lVertmathbf yrVert$
          matches
          $sigma_{X+Y}lesigma_X+sigma_Y$.


        2. $(mathbf x+mathbf y)^mathrm T(mathbf x+mathbf y)=mathbf x^mathrm Tmathbf x+mathbf y^mathrm Tmathbf y+2mathbf x^mathrm Tmathbf y$ matches $mathbf{var}(X+Y)=mathbf{var}(X)+mathbf{var}(Y)+2mathbf{cov}(X,Y)$.

        3. Pythagoras theorem
          $lVertmathbf brVert^2=lVertmathbf prVert^2+lVertmathbf
          erVert^2$
          with orthogonal projection $mathbf p$ and error $mathbf
          e=mathbf b-mathbf p$
          matches
          $mathbf{var}(Theta)=mathbf{var}(hatTheta)+mathbf{var}(tildeTheta)$,
          with uncorrelated estimator $hatTheta$ and estimation error
          $tildeTheta=Theta-hatTheta$. In fact, this is just the law of
          total variance $mathbf{var}(Theta)=mathbf{var}(mathbf
          E[Theta|X])+mathbf E[mathbf{var}(Theta|X)]$
          with
          $hatTheta=mathbf E[Theta|X]$.







        share|cite|improve this answer














        share|cite|improve this answer



        share|cite|improve this answer








        edited Jan 16 at 13:20

























        answered Jan 16 at 11:22









        W. ZhuW. Zhu

        685316




        685316






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Mathematics Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3071967%2fschwarz-inequality-in-linear-algebra-and-probability-theory%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Can a sorcerer learn a 5th-level spell early by creating spell slots using the Font of Magic feature?

            ts Property 'filter' does not exist on type '{}'

            Notepad++ export/extract a list of installed plugins