Schwarz inequality in linear algebra and probability theory
$begingroup$
Linear algebra states Schwarz inequality as
$$lvertmathbf x^mathrm Tmathbf yrvertlelVertmathbf xrVertlVertmathbf yrVerttag 1$$
However, probability theory states it as
$$(mathbf E[XY])^2lemathbf E[X^2]mathbf E[Y^2]tag 2$$
By comparing $lvertsum_i x_iy_irvertlesqrt{sum_i x_i^2sum_i y_i^2}$ with $lvertsum_ysum_x xyp_{X,Y}(x,y)rvertlesqrt{sum_x x^2p_X(x)sum_y y^2p_Y(y)}$, we see that $(1)$ and $(2)$ are equivalent when $p_{X,Y}(x,y)=begin{cases}frac1n&text{if $x=x_i$ and $y=y_i$ for $iin{1,2,cdots,n}$}\0&text{otherwise}end{cases}$. Thus, $(2)$ can be thought of as a more general form of the inequality.
Another way to think about this is to compare $lvertcosthetarvert=frac{lvertmathbf x^mathrm Tmathbf yrvert}{lVertmathbf xrVertlVertmathbf yrVert}le1$ with $lvertrhorvert=frac{lvertmathbf{cov}(X,Y)rvert}{sqrt{mathbf{var}(X)mathbf{var}(Y)}}le1$. The former is exactly $(1)$, while the latter becomes $(2)$ only when $mathbf E[X]=mathbf E[Y]=0$. In some sense, we can view $mathbf x^mathrm Tmathbf y$ as a special form of $mathbf{cov}(X,Y)$. Then, it follows that $mathbf x^mathrm Tmathbf x$ is a form of $mathbf{var}(X)$ and $lVertmathbf xrVert$ is a form of $sqrt{mathbf{var}(X)}$.
What is the special form of $mathbf E[X]$ and how do we understand $mathbf E[X]=mathbf E[Y]=0$ in linear algebra? With $p_{X,Y}$ defined above, we have $mathbf E[XY]=frac{mathbf x^mathrm Tmathbf y}n$, but $mathbf{cov}(X,Y)nemathbf E[XY]$ unless $mathbf E[X]=0$ or $mathbf E[Y]=0$. How can we obtain a relation between $mathbf{cov}(X,Y)$ and $mathbf x^mathrm Tmathbf y$?
linear-algebra probability-theory cauchy-schwarz-inequality
$endgroup$
add a comment |
$begingroup$
Linear algebra states Schwarz inequality as
$$lvertmathbf x^mathrm Tmathbf yrvertlelVertmathbf xrVertlVertmathbf yrVerttag 1$$
However, probability theory states it as
$$(mathbf E[XY])^2lemathbf E[X^2]mathbf E[Y^2]tag 2$$
By comparing $lvertsum_i x_iy_irvertlesqrt{sum_i x_i^2sum_i y_i^2}$ with $lvertsum_ysum_x xyp_{X,Y}(x,y)rvertlesqrt{sum_x x^2p_X(x)sum_y y^2p_Y(y)}$, we see that $(1)$ and $(2)$ are equivalent when $p_{X,Y}(x,y)=begin{cases}frac1n&text{if $x=x_i$ and $y=y_i$ for $iin{1,2,cdots,n}$}\0&text{otherwise}end{cases}$. Thus, $(2)$ can be thought of as a more general form of the inequality.
Another way to think about this is to compare $lvertcosthetarvert=frac{lvertmathbf x^mathrm Tmathbf yrvert}{lVertmathbf xrVertlVertmathbf yrVert}le1$ with $lvertrhorvert=frac{lvertmathbf{cov}(X,Y)rvert}{sqrt{mathbf{var}(X)mathbf{var}(Y)}}le1$. The former is exactly $(1)$, while the latter becomes $(2)$ only when $mathbf E[X]=mathbf E[Y]=0$. In some sense, we can view $mathbf x^mathrm Tmathbf y$ as a special form of $mathbf{cov}(X,Y)$. Then, it follows that $mathbf x^mathrm Tmathbf x$ is a form of $mathbf{var}(X)$ and $lVertmathbf xrVert$ is a form of $sqrt{mathbf{var}(X)}$.
What is the special form of $mathbf E[X]$ and how do we understand $mathbf E[X]=mathbf E[Y]=0$ in linear algebra? With $p_{X,Y}$ defined above, we have $mathbf E[XY]=frac{mathbf x^mathrm Tmathbf y}n$, but $mathbf{cov}(X,Y)nemathbf E[XY]$ unless $mathbf E[X]=0$ or $mathbf E[Y]=0$. How can we obtain a relation between $mathbf{cov}(X,Y)$ and $mathbf x^mathrm Tmathbf y$?
linear-algebra probability-theory cauchy-schwarz-inequality
$endgroup$
1
$begingroup$
Not "the same as", rather "a particular case of" (can you spot how?).
$endgroup$
– Did
Jan 13 at 13:01
$begingroup$
@Did The two inequalities are equivalent when $p_{X,Y}(x,y)= begin{cases} frac1n&text{ if $x=x_i$ and $y=y_i$ for $iin{1,2,cdots,n}$}\ 0&text{ otherwise} end{cases}$!
$endgroup$
– W. Zhu
Jan 14 at 2:59
$begingroup$
Thus, question solved?
$endgroup$
– Did
Jan 14 at 11:26
$begingroup$
@Did I have one more question. If we write $mathbf{cov}(X, Y)$ as $mathbf x^mathrm Tmathbf y$, then $lvertrhorvertle1$ becomes $lvertcosthetarvertle1$. But we need to set $mathbf E[X]=mathbf E[Y]=0$, which means that the components of each of $mathbf x$ and $mathbf y$ average to zero. Shouldn't the inequality hold for all vectors $mathbf x$ and $mathbf y$?
$endgroup$
– W. Zhu
Jan 14 at 15:07
1
$begingroup$
I don't understand the downvote, as is often the case when there's no comment accompanying it. Anyway, there's a recent question on the covariance which addresses exactly the doubts of this post.
$endgroup$
– Giuseppe Negro
Jan 15 at 13:32
add a comment |
$begingroup$
Linear algebra states Schwarz inequality as
$$lvertmathbf x^mathrm Tmathbf yrvertlelVertmathbf xrVertlVertmathbf yrVerttag 1$$
However, probability theory states it as
$$(mathbf E[XY])^2lemathbf E[X^2]mathbf E[Y^2]tag 2$$
By comparing $lvertsum_i x_iy_irvertlesqrt{sum_i x_i^2sum_i y_i^2}$ with $lvertsum_ysum_x xyp_{X,Y}(x,y)rvertlesqrt{sum_x x^2p_X(x)sum_y y^2p_Y(y)}$, we see that $(1)$ and $(2)$ are equivalent when $p_{X,Y}(x,y)=begin{cases}frac1n&text{if $x=x_i$ and $y=y_i$ for $iin{1,2,cdots,n}$}\0&text{otherwise}end{cases}$. Thus, $(2)$ can be thought of as a more general form of the inequality.
Another way to think about this is to compare $lvertcosthetarvert=frac{lvertmathbf x^mathrm Tmathbf yrvert}{lVertmathbf xrVertlVertmathbf yrVert}le1$ with $lvertrhorvert=frac{lvertmathbf{cov}(X,Y)rvert}{sqrt{mathbf{var}(X)mathbf{var}(Y)}}le1$. The former is exactly $(1)$, while the latter becomes $(2)$ only when $mathbf E[X]=mathbf E[Y]=0$. In some sense, we can view $mathbf x^mathrm Tmathbf y$ as a special form of $mathbf{cov}(X,Y)$. Then, it follows that $mathbf x^mathrm Tmathbf x$ is a form of $mathbf{var}(X)$ and $lVertmathbf xrVert$ is a form of $sqrt{mathbf{var}(X)}$.
What is the special form of $mathbf E[X]$ and how do we understand $mathbf E[X]=mathbf E[Y]=0$ in linear algebra? With $p_{X,Y}$ defined above, we have $mathbf E[XY]=frac{mathbf x^mathrm Tmathbf y}n$, but $mathbf{cov}(X,Y)nemathbf E[XY]$ unless $mathbf E[X]=0$ or $mathbf E[Y]=0$. How can we obtain a relation between $mathbf{cov}(X,Y)$ and $mathbf x^mathrm Tmathbf y$?
linear-algebra probability-theory cauchy-schwarz-inequality
$endgroup$
Linear algebra states Schwarz inequality as
$$lvertmathbf x^mathrm Tmathbf yrvertlelVertmathbf xrVertlVertmathbf yrVerttag 1$$
However, probability theory states it as
$$(mathbf E[XY])^2lemathbf E[X^2]mathbf E[Y^2]tag 2$$
By comparing $lvertsum_i x_iy_irvertlesqrt{sum_i x_i^2sum_i y_i^2}$ with $lvertsum_ysum_x xyp_{X,Y}(x,y)rvertlesqrt{sum_x x^2p_X(x)sum_y y^2p_Y(y)}$, we see that $(1)$ and $(2)$ are equivalent when $p_{X,Y}(x,y)=begin{cases}frac1n&text{if $x=x_i$ and $y=y_i$ for $iin{1,2,cdots,n}$}\0&text{otherwise}end{cases}$. Thus, $(2)$ can be thought of as a more general form of the inequality.
Another way to think about this is to compare $lvertcosthetarvert=frac{lvertmathbf x^mathrm Tmathbf yrvert}{lVertmathbf xrVertlVertmathbf yrVert}le1$ with $lvertrhorvert=frac{lvertmathbf{cov}(X,Y)rvert}{sqrt{mathbf{var}(X)mathbf{var}(Y)}}le1$. The former is exactly $(1)$, while the latter becomes $(2)$ only when $mathbf E[X]=mathbf E[Y]=0$. In some sense, we can view $mathbf x^mathrm Tmathbf y$ as a special form of $mathbf{cov}(X,Y)$. Then, it follows that $mathbf x^mathrm Tmathbf x$ is a form of $mathbf{var}(X)$ and $lVertmathbf xrVert$ is a form of $sqrt{mathbf{var}(X)}$.
What is the special form of $mathbf E[X]$ and how do we understand $mathbf E[X]=mathbf E[Y]=0$ in linear algebra? With $p_{X,Y}$ defined above, we have $mathbf E[XY]=frac{mathbf x^mathrm Tmathbf y}n$, but $mathbf{cov}(X,Y)nemathbf E[XY]$ unless $mathbf E[X]=0$ or $mathbf E[Y]=0$. How can we obtain a relation between $mathbf{cov}(X,Y)$ and $mathbf x^mathrm Tmathbf y$?
linear-algebra probability-theory cauchy-schwarz-inequality
linear-algebra probability-theory cauchy-schwarz-inequality
edited Jan 15 at 7:50
W. Zhu
asked Jan 13 at 12:33
W. ZhuW. Zhu
685316
685316
1
$begingroup$
Not "the same as", rather "a particular case of" (can you spot how?).
$endgroup$
– Did
Jan 13 at 13:01
$begingroup$
@Did The two inequalities are equivalent when $p_{X,Y}(x,y)= begin{cases} frac1n&text{ if $x=x_i$ and $y=y_i$ for $iin{1,2,cdots,n}$}\ 0&text{ otherwise} end{cases}$!
$endgroup$
– W. Zhu
Jan 14 at 2:59
$begingroup$
Thus, question solved?
$endgroup$
– Did
Jan 14 at 11:26
$begingroup$
@Did I have one more question. If we write $mathbf{cov}(X, Y)$ as $mathbf x^mathrm Tmathbf y$, then $lvertrhorvertle1$ becomes $lvertcosthetarvertle1$. But we need to set $mathbf E[X]=mathbf E[Y]=0$, which means that the components of each of $mathbf x$ and $mathbf y$ average to zero. Shouldn't the inequality hold for all vectors $mathbf x$ and $mathbf y$?
$endgroup$
– W. Zhu
Jan 14 at 15:07
1
$begingroup$
I don't understand the downvote, as is often the case when there's no comment accompanying it. Anyway, there's a recent question on the covariance which addresses exactly the doubts of this post.
$endgroup$
– Giuseppe Negro
Jan 15 at 13:32
add a comment |
1
$begingroup$
Not "the same as", rather "a particular case of" (can you spot how?).
$endgroup$
– Did
Jan 13 at 13:01
$begingroup$
@Did The two inequalities are equivalent when $p_{X,Y}(x,y)= begin{cases} frac1n&text{ if $x=x_i$ and $y=y_i$ for $iin{1,2,cdots,n}$}\ 0&text{ otherwise} end{cases}$!
$endgroup$
– W. Zhu
Jan 14 at 2:59
$begingroup$
Thus, question solved?
$endgroup$
– Did
Jan 14 at 11:26
$begingroup$
@Did I have one more question. If we write $mathbf{cov}(X, Y)$ as $mathbf x^mathrm Tmathbf y$, then $lvertrhorvertle1$ becomes $lvertcosthetarvertle1$. But we need to set $mathbf E[X]=mathbf E[Y]=0$, which means that the components of each of $mathbf x$ and $mathbf y$ average to zero. Shouldn't the inequality hold for all vectors $mathbf x$ and $mathbf y$?
$endgroup$
– W. Zhu
Jan 14 at 15:07
1
$begingroup$
I don't understand the downvote, as is often the case when there's no comment accompanying it. Anyway, there's a recent question on the covariance which addresses exactly the doubts of this post.
$endgroup$
– Giuseppe Negro
Jan 15 at 13:32
1
1
$begingroup$
Not "the same as", rather "a particular case of" (can you spot how?).
$endgroup$
– Did
Jan 13 at 13:01
$begingroup$
Not "the same as", rather "a particular case of" (can you spot how?).
$endgroup$
– Did
Jan 13 at 13:01
$begingroup$
@Did The two inequalities are equivalent when $p_{X,Y}(x,y)= begin{cases} frac1n&text{ if $x=x_i$ and $y=y_i$ for $iin{1,2,cdots,n}$}\ 0&text{ otherwise} end{cases}$!
$endgroup$
– W. Zhu
Jan 14 at 2:59
$begingroup$
@Did The two inequalities are equivalent when $p_{X,Y}(x,y)= begin{cases} frac1n&text{ if $x=x_i$ and $y=y_i$ for $iin{1,2,cdots,n}$}\ 0&text{ otherwise} end{cases}$!
$endgroup$
– W. Zhu
Jan 14 at 2:59
$begingroup$
Thus, question solved?
$endgroup$
– Did
Jan 14 at 11:26
$begingroup$
Thus, question solved?
$endgroup$
– Did
Jan 14 at 11:26
$begingroup$
@Did I have one more question. If we write $mathbf{cov}(X, Y)$ as $mathbf x^mathrm Tmathbf y$, then $lvertrhorvertle1$ becomes $lvertcosthetarvertle1$. But we need to set $mathbf E[X]=mathbf E[Y]=0$, which means that the components of each of $mathbf x$ and $mathbf y$ average to zero. Shouldn't the inequality hold for all vectors $mathbf x$ and $mathbf y$?
$endgroup$
– W. Zhu
Jan 14 at 15:07
$begingroup$
@Did I have one more question. If we write $mathbf{cov}(X, Y)$ as $mathbf x^mathrm Tmathbf y$, then $lvertrhorvertle1$ becomes $lvertcosthetarvertle1$. But we need to set $mathbf E[X]=mathbf E[Y]=0$, which means that the components of each of $mathbf x$ and $mathbf y$ average to zero. Shouldn't the inequality hold for all vectors $mathbf x$ and $mathbf y$?
$endgroup$
– W. Zhu
Jan 14 at 15:07
1
1
$begingroup$
I don't understand the downvote, as is often the case when there's no comment accompanying it. Anyway, there's a recent question on the covariance which addresses exactly the doubts of this post.
$endgroup$
– Giuseppe Negro
Jan 15 at 13:32
$begingroup$
I don't understand the downvote, as is often the case when there's no comment accompanying it. Anyway, there's a recent question on the covariance which addresses exactly the doubts of this post.
$endgroup$
– Giuseppe Negro
Jan 15 at 13:32
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
After reading J.G.'s answer and some thinking, I have arrived at a satisfactory answer. I will post my thoughts below.
Let $mathbf xinBbb R^n$ denote a discrete uniform random variable with each component corresoponding to each outcome. Then $mathbf E[mathbf x]$ is the average of the components, and $mathbf E[mathbf x]=0$ means that the components sum to zero. Thus, for zero-mean random variables, we can choose $n-1$ components and set the last component to $-sum_{i=1}^{n-1}x_i$. These vectors form an $n-1$-dimensional subspace. We can bring any vector to this centered subspace $C$ by subtracting from each component the average of all components.
Now we consider two vectors $mathbf x$ and $mathbf y$ in $C$. We can use a matrix to represent the joint distribution. Put $x_i$'s in the rows and $y_i$'s in the columns, and consider this joint distribution matrix:
$$D=
begin{bmatrix}
frac1n&0&0&cdots&0\
0&frac1n&0&cdots&0\
vdots&vdots&vdots&ddots&vdots\
0&0&0&cdots&frac1n
end{bmatrix}$$
This distribution is special because it puts equal weights on the diagonal entries and zero weight on the off-diagonal entries. We may call this the discrete uniform diagonal joint distribution. It is easily seen that $mathbf x$ and $mathbf y$ are discrete uniform but not independent ($mathbf x$ being $x_i$ forces $mathbf y$ to be $y_i$).
Under these assumptions, $mathbf{cov}(mathbf x, mathbf y)=frac{mathbf x^mathrm Tmathbf y}n$, $mathbf{var}(mathbf x)=frac{mathbf x^mathrm Tmathbf x}n$, $sigma_{mathbf x}=frac{lVertmathbf xrVert}{sqrt n}$ and $rho=frac{mathbf{cov}(mathbf x,mathbf y)}{sigma_{mathbf x}sigma_{mathbf y}}=frac{mathbf x^mathrm Tmathbf y}{lVertmathbf xrVertlVertmathbf yrVert}=costheta$. When $mathbf x$ and $mathbf y$ are orthogonal vectors, they are uncorrelated random variables. Although they are linearly independent vectors, they are not independent random variables.
Now we have a correspondence between covariance and dot product, standard deviation and length, correlation coefficient and the cosine of the angle between two vectors, and uncorrelatedness and orthogonality. Thus, Schwarz inequality $lvertcosthetarvertle1$ matches $lvertrhorvertle1$.
Let us look at 3 more examples that connect linear algebra to probability theory:
- The triangle inequality $lVertmathbf x+mathbf
yrVertlelVertmathbf xrVert+lVertmathbf yrVert$ matches
$sigma_{X+Y}lesigma_X+sigma_Y$.
$(mathbf x+mathbf y)^mathrm T(mathbf x+mathbf y)=mathbf x^mathrm Tmathbf x+mathbf y^mathrm Tmathbf y+2mathbf x^mathrm Tmathbf y$ matches $mathbf{var}(X+Y)=mathbf{var}(X)+mathbf{var}(Y)+2mathbf{cov}(X,Y)$.- Pythagoras theorem
$lVertmathbf brVert^2=lVertmathbf prVert^2+lVertmathbf
erVert^2$ with orthogonal projection $mathbf p$ and error $mathbf
e=mathbf b-mathbf p$ matches
$mathbf{var}(Theta)=mathbf{var}(hatTheta)+mathbf{var}(tildeTheta)$,
with uncorrelated estimator $hatTheta$ and estimation error
$tildeTheta=Theta-hatTheta$. In fact, this is just the law of
total variance $mathbf{var}(Theta)=mathbf{var}(mathbf
E[Theta|X])+mathbf E[mathbf{var}(Theta|X)]$ with
$hatTheta=mathbf E[Theta|X]$.
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3071967%2fschwarz-inequality-in-linear-algebra-and-probability-theory%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
After reading J.G.'s answer and some thinking, I have arrived at a satisfactory answer. I will post my thoughts below.
Let $mathbf xinBbb R^n$ denote a discrete uniform random variable with each component corresoponding to each outcome. Then $mathbf E[mathbf x]$ is the average of the components, and $mathbf E[mathbf x]=0$ means that the components sum to zero. Thus, for zero-mean random variables, we can choose $n-1$ components and set the last component to $-sum_{i=1}^{n-1}x_i$. These vectors form an $n-1$-dimensional subspace. We can bring any vector to this centered subspace $C$ by subtracting from each component the average of all components.
Now we consider two vectors $mathbf x$ and $mathbf y$ in $C$. We can use a matrix to represent the joint distribution. Put $x_i$'s in the rows and $y_i$'s in the columns, and consider this joint distribution matrix:
$$D=
begin{bmatrix}
frac1n&0&0&cdots&0\
0&frac1n&0&cdots&0\
vdots&vdots&vdots&ddots&vdots\
0&0&0&cdots&frac1n
end{bmatrix}$$
This distribution is special because it puts equal weights on the diagonal entries and zero weight on the off-diagonal entries. We may call this the discrete uniform diagonal joint distribution. It is easily seen that $mathbf x$ and $mathbf y$ are discrete uniform but not independent ($mathbf x$ being $x_i$ forces $mathbf y$ to be $y_i$).
Under these assumptions, $mathbf{cov}(mathbf x, mathbf y)=frac{mathbf x^mathrm Tmathbf y}n$, $mathbf{var}(mathbf x)=frac{mathbf x^mathrm Tmathbf x}n$, $sigma_{mathbf x}=frac{lVertmathbf xrVert}{sqrt n}$ and $rho=frac{mathbf{cov}(mathbf x,mathbf y)}{sigma_{mathbf x}sigma_{mathbf y}}=frac{mathbf x^mathrm Tmathbf y}{lVertmathbf xrVertlVertmathbf yrVert}=costheta$. When $mathbf x$ and $mathbf y$ are orthogonal vectors, they are uncorrelated random variables. Although they are linearly independent vectors, they are not independent random variables.
Now we have a correspondence between covariance and dot product, standard deviation and length, correlation coefficient and the cosine of the angle between two vectors, and uncorrelatedness and orthogonality. Thus, Schwarz inequality $lvertcosthetarvertle1$ matches $lvertrhorvertle1$.
Let us look at 3 more examples that connect linear algebra to probability theory:
- The triangle inequality $lVertmathbf x+mathbf
yrVertlelVertmathbf xrVert+lVertmathbf yrVert$ matches
$sigma_{X+Y}lesigma_X+sigma_Y$.
$(mathbf x+mathbf y)^mathrm T(mathbf x+mathbf y)=mathbf x^mathrm Tmathbf x+mathbf y^mathrm Tmathbf y+2mathbf x^mathrm Tmathbf y$ matches $mathbf{var}(X+Y)=mathbf{var}(X)+mathbf{var}(Y)+2mathbf{cov}(X,Y)$.- Pythagoras theorem
$lVertmathbf brVert^2=lVertmathbf prVert^2+lVertmathbf
erVert^2$ with orthogonal projection $mathbf p$ and error $mathbf
e=mathbf b-mathbf p$ matches
$mathbf{var}(Theta)=mathbf{var}(hatTheta)+mathbf{var}(tildeTheta)$,
with uncorrelated estimator $hatTheta$ and estimation error
$tildeTheta=Theta-hatTheta$. In fact, this is just the law of
total variance $mathbf{var}(Theta)=mathbf{var}(mathbf
E[Theta|X])+mathbf E[mathbf{var}(Theta|X)]$ with
$hatTheta=mathbf E[Theta|X]$.
$endgroup$
add a comment |
$begingroup$
After reading J.G.'s answer and some thinking, I have arrived at a satisfactory answer. I will post my thoughts below.
Let $mathbf xinBbb R^n$ denote a discrete uniform random variable with each component corresoponding to each outcome. Then $mathbf E[mathbf x]$ is the average of the components, and $mathbf E[mathbf x]=0$ means that the components sum to zero. Thus, for zero-mean random variables, we can choose $n-1$ components and set the last component to $-sum_{i=1}^{n-1}x_i$. These vectors form an $n-1$-dimensional subspace. We can bring any vector to this centered subspace $C$ by subtracting from each component the average of all components.
Now we consider two vectors $mathbf x$ and $mathbf y$ in $C$. We can use a matrix to represent the joint distribution. Put $x_i$'s in the rows and $y_i$'s in the columns, and consider this joint distribution matrix:
$$D=
begin{bmatrix}
frac1n&0&0&cdots&0\
0&frac1n&0&cdots&0\
vdots&vdots&vdots&ddots&vdots\
0&0&0&cdots&frac1n
end{bmatrix}$$
This distribution is special because it puts equal weights on the diagonal entries and zero weight on the off-diagonal entries. We may call this the discrete uniform diagonal joint distribution. It is easily seen that $mathbf x$ and $mathbf y$ are discrete uniform but not independent ($mathbf x$ being $x_i$ forces $mathbf y$ to be $y_i$).
Under these assumptions, $mathbf{cov}(mathbf x, mathbf y)=frac{mathbf x^mathrm Tmathbf y}n$, $mathbf{var}(mathbf x)=frac{mathbf x^mathrm Tmathbf x}n$, $sigma_{mathbf x}=frac{lVertmathbf xrVert}{sqrt n}$ and $rho=frac{mathbf{cov}(mathbf x,mathbf y)}{sigma_{mathbf x}sigma_{mathbf y}}=frac{mathbf x^mathrm Tmathbf y}{lVertmathbf xrVertlVertmathbf yrVert}=costheta$. When $mathbf x$ and $mathbf y$ are orthogonal vectors, they are uncorrelated random variables. Although they are linearly independent vectors, they are not independent random variables.
Now we have a correspondence between covariance and dot product, standard deviation and length, correlation coefficient and the cosine of the angle between two vectors, and uncorrelatedness and orthogonality. Thus, Schwarz inequality $lvertcosthetarvertle1$ matches $lvertrhorvertle1$.
Let us look at 3 more examples that connect linear algebra to probability theory:
- The triangle inequality $lVertmathbf x+mathbf
yrVertlelVertmathbf xrVert+lVertmathbf yrVert$ matches
$sigma_{X+Y}lesigma_X+sigma_Y$.
$(mathbf x+mathbf y)^mathrm T(mathbf x+mathbf y)=mathbf x^mathrm Tmathbf x+mathbf y^mathrm Tmathbf y+2mathbf x^mathrm Tmathbf y$ matches $mathbf{var}(X+Y)=mathbf{var}(X)+mathbf{var}(Y)+2mathbf{cov}(X,Y)$.- Pythagoras theorem
$lVertmathbf brVert^2=lVertmathbf prVert^2+lVertmathbf
erVert^2$ with orthogonal projection $mathbf p$ and error $mathbf
e=mathbf b-mathbf p$ matches
$mathbf{var}(Theta)=mathbf{var}(hatTheta)+mathbf{var}(tildeTheta)$,
with uncorrelated estimator $hatTheta$ and estimation error
$tildeTheta=Theta-hatTheta$. In fact, this is just the law of
total variance $mathbf{var}(Theta)=mathbf{var}(mathbf
E[Theta|X])+mathbf E[mathbf{var}(Theta|X)]$ with
$hatTheta=mathbf E[Theta|X]$.
$endgroup$
add a comment |
$begingroup$
After reading J.G.'s answer and some thinking, I have arrived at a satisfactory answer. I will post my thoughts below.
Let $mathbf xinBbb R^n$ denote a discrete uniform random variable with each component corresoponding to each outcome. Then $mathbf E[mathbf x]$ is the average of the components, and $mathbf E[mathbf x]=0$ means that the components sum to zero. Thus, for zero-mean random variables, we can choose $n-1$ components and set the last component to $-sum_{i=1}^{n-1}x_i$. These vectors form an $n-1$-dimensional subspace. We can bring any vector to this centered subspace $C$ by subtracting from each component the average of all components.
Now we consider two vectors $mathbf x$ and $mathbf y$ in $C$. We can use a matrix to represent the joint distribution. Put $x_i$'s in the rows and $y_i$'s in the columns, and consider this joint distribution matrix:
$$D=
begin{bmatrix}
frac1n&0&0&cdots&0\
0&frac1n&0&cdots&0\
vdots&vdots&vdots&ddots&vdots\
0&0&0&cdots&frac1n
end{bmatrix}$$
This distribution is special because it puts equal weights on the diagonal entries and zero weight on the off-diagonal entries. We may call this the discrete uniform diagonal joint distribution. It is easily seen that $mathbf x$ and $mathbf y$ are discrete uniform but not independent ($mathbf x$ being $x_i$ forces $mathbf y$ to be $y_i$).
Under these assumptions, $mathbf{cov}(mathbf x, mathbf y)=frac{mathbf x^mathrm Tmathbf y}n$, $mathbf{var}(mathbf x)=frac{mathbf x^mathrm Tmathbf x}n$, $sigma_{mathbf x}=frac{lVertmathbf xrVert}{sqrt n}$ and $rho=frac{mathbf{cov}(mathbf x,mathbf y)}{sigma_{mathbf x}sigma_{mathbf y}}=frac{mathbf x^mathrm Tmathbf y}{lVertmathbf xrVertlVertmathbf yrVert}=costheta$. When $mathbf x$ and $mathbf y$ are orthogonal vectors, they are uncorrelated random variables. Although they are linearly independent vectors, they are not independent random variables.
Now we have a correspondence between covariance and dot product, standard deviation and length, correlation coefficient and the cosine of the angle between two vectors, and uncorrelatedness and orthogonality. Thus, Schwarz inequality $lvertcosthetarvertle1$ matches $lvertrhorvertle1$.
Let us look at 3 more examples that connect linear algebra to probability theory:
- The triangle inequality $lVertmathbf x+mathbf
yrVertlelVertmathbf xrVert+lVertmathbf yrVert$ matches
$sigma_{X+Y}lesigma_X+sigma_Y$.
$(mathbf x+mathbf y)^mathrm T(mathbf x+mathbf y)=mathbf x^mathrm Tmathbf x+mathbf y^mathrm Tmathbf y+2mathbf x^mathrm Tmathbf y$ matches $mathbf{var}(X+Y)=mathbf{var}(X)+mathbf{var}(Y)+2mathbf{cov}(X,Y)$.- Pythagoras theorem
$lVertmathbf brVert^2=lVertmathbf prVert^2+lVertmathbf
erVert^2$ with orthogonal projection $mathbf p$ and error $mathbf
e=mathbf b-mathbf p$ matches
$mathbf{var}(Theta)=mathbf{var}(hatTheta)+mathbf{var}(tildeTheta)$,
with uncorrelated estimator $hatTheta$ and estimation error
$tildeTheta=Theta-hatTheta$. In fact, this is just the law of
total variance $mathbf{var}(Theta)=mathbf{var}(mathbf
E[Theta|X])+mathbf E[mathbf{var}(Theta|X)]$ with
$hatTheta=mathbf E[Theta|X]$.
$endgroup$
After reading J.G.'s answer and some thinking, I have arrived at a satisfactory answer. I will post my thoughts below.
Let $mathbf xinBbb R^n$ denote a discrete uniform random variable with each component corresoponding to each outcome. Then $mathbf E[mathbf x]$ is the average of the components, and $mathbf E[mathbf x]=0$ means that the components sum to zero. Thus, for zero-mean random variables, we can choose $n-1$ components and set the last component to $-sum_{i=1}^{n-1}x_i$. These vectors form an $n-1$-dimensional subspace. We can bring any vector to this centered subspace $C$ by subtracting from each component the average of all components.
Now we consider two vectors $mathbf x$ and $mathbf y$ in $C$. We can use a matrix to represent the joint distribution. Put $x_i$'s in the rows and $y_i$'s in the columns, and consider this joint distribution matrix:
$$D=
begin{bmatrix}
frac1n&0&0&cdots&0\
0&frac1n&0&cdots&0\
vdots&vdots&vdots&ddots&vdots\
0&0&0&cdots&frac1n
end{bmatrix}$$
This distribution is special because it puts equal weights on the diagonal entries and zero weight on the off-diagonal entries. We may call this the discrete uniform diagonal joint distribution. It is easily seen that $mathbf x$ and $mathbf y$ are discrete uniform but not independent ($mathbf x$ being $x_i$ forces $mathbf y$ to be $y_i$).
Under these assumptions, $mathbf{cov}(mathbf x, mathbf y)=frac{mathbf x^mathrm Tmathbf y}n$, $mathbf{var}(mathbf x)=frac{mathbf x^mathrm Tmathbf x}n$, $sigma_{mathbf x}=frac{lVertmathbf xrVert}{sqrt n}$ and $rho=frac{mathbf{cov}(mathbf x,mathbf y)}{sigma_{mathbf x}sigma_{mathbf y}}=frac{mathbf x^mathrm Tmathbf y}{lVertmathbf xrVertlVertmathbf yrVert}=costheta$. When $mathbf x$ and $mathbf y$ are orthogonal vectors, they are uncorrelated random variables. Although they are linearly independent vectors, they are not independent random variables.
Now we have a correspondence between covariance and dot product, standard deviation and length, correlation coefficient and the cosine of the angle between two vectors, and uncorrelatedness and orthogonality. Thus, Schwarz inequality $lvertcosthetarvertle1$ matches $lvertrhorvertle1$.
Let us look at 3 more examples that connect linear algebra to probability theory:
- The triangle inequality $lVertmathbf x+mathbf
yrVertlelVertmathbf xrVert+lVertmathbf yrVert$ matches
$sigma_{X+Y}lesigma_X+sigma_Y$.
$(mathbf x+mathbf y)^mathrm T(mathbf x+mathbf y)=mathbf x^mathrm Tmathbf x+mathbf y^mathrm Tmathbf y+2mathbf x^mathrm Tmathbf y$ matches $mathbf{var}(X+Y)=mathbf{var}(X)+mathbf{var}(Y)+2mathbf{cov}(X,Y)$.- Pythagoras theorem
$lVertmathbf brVert^2=lVertmathbf prVert^2+lVertmathbf
erVert^2$ with orthogonal projection $mathbf p$ and error $mathbf
e=mathbf b-mathbf p$ matches
$mathbf{var}(Theta)=mathbf{var}(hatTheta)+mathbf{var}(tildeTheta)$,
with uncorrelated estimator $hatTheta$ and estimation error
$tildeTheta=Theta-hatTheta$. In fact, this is just the law of
total variance $mathbf{var}(Theta)=mathbf{var}(mathbf
E[Theta|X])+mathbf E[mathbf{var}(Theta|X)]$ with
$hatTheta=mathbf E[Theta|X]$.
edited Jan 16 at 13:20
answered Jan 16 at 11:22
W. ZhuW. Zhu
685316
685316
add a comment |
add a comment |
Thanks for contributing an answer to Mathematics Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3071967%2fschwarz-inequality-in-linear-algebra-and-probability-theory%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
$begingroup$
Not "the same as", rather "a particular case of" (can you spot how?).
$endgroup$
– Did
Jan 13 at 13:01
$begingroup$
@Did The two inequalities are equivalent when $p_{X,Y}(x,y)= begin{cases} frac1n&text{ if $x=x_i$ and $y=y_i$ for $iin{1,2,cdots,n}$}\ 0&text{ otherwise} end{cases}$!
$endgroup$
– W. Zhu
Jan 14 at 2:59
$begingroup$
Thus, question solved?
$endgroup$
– Did
Jan 14 at 11:26
$begingroup$
@Did I have one more question. If we write $mathbf{cov}(X, Y)$ as $mathbf x^mathrm Tmathbf y$, then $lvertrhorvertle1$ becomes $lvertcosthetarvertle1$. But we need to set $mathbf E[X]=mathbf E[Y]=0$, which means that the components of each of $mathbf x$ and $mathbf y$ average to zero. Shouldn't the inequality hold for all vectors $mathbf x$ and $mathbf y$?
$endgroup$
– W. Zhu
Jan 14 at 15:07
1
$begingroup$
I don't understand the downvote, as is often the case when there's no comment accompanying it. Anyway, there's a recent question on the covariance which addresses exactly the doubts of this post.
$endgroup$
– Giuseppe Negro
Jan 15 at 13:32