Weighted Least Squares
$begingroup$
I understand the concept of least squares but I'm not able to wrap my head around weighted least squares (the matrix form).
We convert $Ax = b$ to $WAx = Wb$. What exactly happens when we multiply the equation with $W$? Is the column space of A modified based on the changed equations? Also how do I find this matrix $W$, assuming I have the given data (The probability of each observation according to a textbook example "Linear Algebra and it's Applications" by Gilbert Strang, page 174, question 42).
Suppose you guess your professor's age, making errors $e = -2, -1, 5$ with probabilities $1/2, 1/4, 1/4$. If the professor guesses too (or tries to remember), making errors $-1, 0, 1$ with probabilities $1/8, 6/8, 1/8$, what weights $w_1$ and $w_2$ give the reliability of your guess and the professor's guess?
linear-algebra statistics
$endgroup$
add a comment |
$begingroup$
I understand the concept of least squares but I'm not able to wrap my head around weighted least squares (the matrix form).
We convert $Ax = b$ to $WAx = Wb$. What exactly happens when we multiply the equation with $W$? Is the column space of A modified based on the changed equations? Also how do I find this matrix $W$, assuming I have the given data (The probability of each observation according to a textbook example "Linear Algebra and it's Applications" by Gilbert Strang, page 174, question 42).
Suppose you guess your professor's age, making errors $e = -2, -1, 5$ with probabilities $1/2, 1/4, 1/4$. If the professor guesses too (or tries to remember), making errors $-1, 0, 1$ with probabilities $1/8, 6/8, 1/8$, what weights $w_1$ and $w_2$ give the reliability of your guess and the professor's guess?
linear-algebra statistics
$endgroup$
$begingroup$
I believe the matrix $W$ is a diagonal matrix and is essentially a matrix of weights. Perhaps post more context from the textbook so I can say for sure (or include the textbook reference, page number and name of book).
$endgroup$
– JEET TRIVEDI
Jan 31 at 15:38
add a comment |
$begingroup$
I understand the concept of least squares but I'm not able to wrap my head around weighted least squares (the matrix form).
We convert $Ax = b$ to $WAx = Wb$. What exactly happens when we multiply the equation with $W$? Is the column space of A modified based on the changed equations? Also how do I find this matrix $W$, assuming I have the given data (The probability of each observation according to a textbook example "Linear Algebra and it's Applications" by Gilbert Strang, page 174, question 42).
Suppose you guess your professor's age, making errors $e = -2, -1, 5$ with probabilities $1/2, 1/4, 1/4$. If the professor guesses too (or tries to remember), making errors $-1, 0, 1$ with probabilities $1/8, 6/8, 1/8$, what weights $w_1$ and $w_2$ give the reliability of your guess and the professor's guess?
linear-algebra statistics
$endgroup$
I understand the concept of least squares but I'm not able to wrap my head around weighted least squares (the matrix form).
We convert $Ax = b$ to $WAx = Wb$. What exactly happens when we multiply the equation with $W$? Is the column space of A modified based on the changed equations? Also how do I find this matrix $W$, assuming I have the given data (The probability of each observation according to a textbook example "Linear Algebra and it's Applications" by Gilbert Strang, page 174, question 42).
Suppose you guess your professor's age, making errors $e = -2, -1, 5$ with probabilities $1/2, 1/4, 1/4$. If the professor guesses too (or tries to remember), making errors $-1, 0, 1$ with probabilities $1/8, 6/8, 1/8$, what weights $w_1$ and $w_2$ give the reliability of your guess and the professor's guess?
linear-algebra statistics
linear-algebra statistics
edited Jan 31 at 15:49
Cyanide2002
asked Jan 31 at 14:10
Cyanide2002Cyanide2002
305
305
$begingroup$
I believe the matrix $W$ is a diagonal matrix and is essentially a matrix of weights. Perhaps post more context from the textbook so I can say for sure (or include the textbook reference, page number and name of book).
$endgroup$
– JEET TRIVEDI
Jan 31 at 15:38
add a comment |
$begingroup$
I believe the matrix $W$ is a diagonal matrix and is essentially a matrix of weights. Perhaps post more context from the textbook so I can say for sure (or include the textbook reference, page number and name of book).
$endgroup$
– JEET TRIVEDI
Jan 31 at 15:38
$begingroup$
I believe the matrix $W$ is a diagonal matrix and is essentially a matrix of weights. Perhaps post more context from the textbook so I can say for sure (or include the textbook reference, page number and name of book).
$endgroup$
– JEET TRIVEDI
Jan 31 at 15:38
$begingroup$
I believe the matrix $W$ is a diagonal matrix and is essentially a matrix of weights. Perhaps post more context from the textbook so I can say for sure (or include the textbook reference, page number and name of book).
$endgroup$
– JEET TRIVEDI
Jan 31 at 15:38
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
It's probably easiest to understand weighted linear least squares (LLS) by explaining
the motivation behind "ordinary" (i.e., non-weighted) LLS first.
Ordinary LLS
The setting is as follows: you are given measurements $(x_{1},y_{1}),ldots,(x_{N},y_{N})$ where $x_{n}inmathbb{R}^{d}$ and $y_{n}inmathbb{R}$.
You are asked to find an affine function $f$ such that $f(x_{n})approx y_{n}$ for each $n$.
The hope is that this affine function can give you good estimates of the output parameter $y$ for arbitrary inputs $x$.
One way to find such a function is to pick it such that "mean squared error"
$$
text{MSE}equivfrac{1}{N}sum_{n}left(f(x_{n})-y_{n}right)^{2}
$$
is minimized.
Since $f$ is assumed to be affine, it has the form $f(x)=beta_{0}+x^{intercal}beta_{1}$.
Plugging this into the above,
$$
text{MSE}=frac{1}{N}sum_{n}left(beta_{0}+x_{n}^{intercal}beta_{1}-y_{n}right)^{2}.
$$
Let's take a short detour and rewrite the MSE in terms of matrices and vectors (this well help us in taking derivatives in the next paragraph).
In order to do so, let
$$
Xequivbegin{pmatrix}1 & x_{1}^{intercal}\
1 & x_{2}^{intercal}\
vdots & vdots\
1 & x_{n}^{intercal}
end{pmatrix}text{, }yequivbegin{pmatrix}y_{1}\
y_{2}\
vdots\
y_{n}
end{pmatrix}text{,}text{ and }betaequivbegin{pmatrix}beta_{0}\
beta_{1}
end{pmatrix}.
$$
Then,
$$
text{MSE}=frac{1}{N}Vert Xbeta-yVert^{2}=frac{1}{N}left(Xbeta-yright)^{intercal}left(Xbeta-yright).
$$
Recall from calculus that the minimum of the MSE occurs at points where its derivative is zero.
Using matrix calculus, the derivative is
$$
frac{partialtext{MSE}}{partialbeta}=frac{2}{N}frac{partial}{partialbeta}left[Xbeta-yright]left(Xbeta-yright)=frac{2}{N}X^{intercal}left(Xbeta-yright).
$$
Setting this to zero, we get the equation
$$
X^{intercal}Xbeta=Xy.
$$
Defining $Aequiv X^{intercal}X$ and $bequiv Xy$, the above becomes the square linear system $Abeta=b$.
You can solve this with ordinary tools from linear algebra when $A$ is nonsingular.
You can also solve it when $A$ is singular but doing so requires a notion of pseudo-inverse, which is more advanced material that you can safely ignore for now.
Weighted LLS
What is weighted LLS?
In weighted LLS, you assign a "belief" to each measurement $(x_{n},y_{n})$.
This allows you to value certain measurements more than others.
The beliefs are positive numbers $w_{1},ldots,w_{N}$.
The larger $w_{n}$, the more you believe in the measurement $(x_{n},y_{n})$.
The "weighted mean squared error" is
$$
text{WMSE}equivfrac{1}{N}sum_{n}w_{n}left(beta_0 + x_n^intercal beta_1-y_{n}right)^{2}.
$$
Defining the diagonal matrix
$$
W^{frac{1}{2}}=begin{pmatrix}sqrt{w_{1}}\
& sqrt{w_{2}}\
& & ddots\
& & & sqrt{w_{N}}
end{pmatrix},
$$
we can rewrite the WMSE in the efficient form
$$
text{WMSE}=frac{1}{N}left(W^{frac{1}{2}}left(Xbeta-yright)right)^{intercal}left(W^{frac{1}{2}}left(Xbeta-yright)right).
$$
Taking the derivative,
$$
frac{partialtext{WMSE}}{partialbeta}=frac{2}{N}frac{partial}{partialbeta}left[W^{frac{1}{2}}left(Xbeta-yright)right]W^{frac{1}{2}}left(Xbeta-yright)=frac{2}{N}X^{intercal}Wleft(Xbeta-yright).
$$
Setting this to zero, we get the equation
$$
X^{intercal}WXbeta=X^{intercal}Wy.
$$
Defining $A^{(w)}equiv X^{intercal}WX$ and $b^{(w)}=X^{intercal}Wy$, the above becomes the square linear system $A^{(w)}beta=b^{(w)}$.
As usual, you can tackle this with ordinary linear algebra.
Since the weights are positive and the matrix $W$ is diagonal, the column space of $X^intercal$ and $X^intercal W$ are the same.
Professor's age
We are given two guesses $y_1$ and $y_2$ of the professor's age and asked to produce our own final guess $beta$ by choosing an appropriate weights $w_1$ and $w_2$.
The MSE in this case is
$$
text{MSE} equiv frac{1}{2} left[ w_1 left(beta - y_1right)^2 + w_2 left(beta - y_2right)^2 right].
$$
Defining $X = (1, 1)^intercal$, $y = (y_1, y_2)^intercal$ and $W = operatorname{diag}(sqrt{w_1}, sqrt{w_2})$, the arguments above imply that
$$
beta = (X^intercal W X)^{-1} X^intercal W y
$$
minimizes the MSE.
You can check that the above is equivalent to
$$
beta = frac{w_1 y_1 + w_2 y_2}{w_1 + w_2}.
$$
Without loss of generality, we can pick $w_2 = 1$ so that
$$
beta = frac{w_1 y_1 + y_2}{1 + w_1}.
$$
Next, let $beta^star$ be the professors true age.
Note that
$$
beta - beta^star = frac{w_1 left(y_1 - beta^starright) + left(y_2 - beta^starright)}{1 + w_1}.
$$
Let $y_1$ be the student's guess and $y_2$ be the professor's.
As per the question statement, both are unbiased estimators of the professor's age:
begin{align*}
mathbb{E}[y_1 - beta^star] & = -2 frac{1}{2} - 1 frac{1}{4} + 5 frac{1}{4} = 0\
mathbb{E}[y_2 - beta^star] & = -1 frac{1}{8} + 1 frac{1}{8} = 0
end{align*}
Therefore, $mathbb{E}[beta - beta^star] = 0$, and the expected value cannot help us pick the weights.
Therefore, we look to the variance:
$$
operatorname{Var}(beta-beta^{star})=frac{1}{left(1+w_{1}right)^{2}}left(w_{1}^{2}operatorname{Var}(y_{1}-beta^{star})+operatorname{Var}(y_{2}-beta^{star})+2w_{1}operatorname{Cov}(y_{1}-beta^{star},y_{2}-beta^{star})right).
$$
Note that
begin{align*}
operatorname{Var}(y_1 - beta^star) & = (-2)^2 frac{1}{2} + (-1)^2 frac{1}{4} + 5^2 frac{1}{4} = frac{17}{2} \
operatorname{Var}(y_2 - beta^star) & = (-1)^2 frac{1}{8} + 1^2 frac{1}{8} = frac{1}{4}.
end{align*}
For brevity, let $c equiv operatorname{Cov}(y_{1}-beta^{star},y_{2}-beta^{star})$.
Plugging these values back into the variance equation,
$$
operatorname{Var}(beta-beta^{star})=frac{1}{left(1+w_{1}right)^{2}}left(w_{1}^{2}frac{17}{2}+frac{1}{4}+2w_{1}cright).
$$
To pick the "best" weight $w_1$, we try to minimize the variance of $beta - beta^star$.
However, in order to do so, we need the quantity $c$, which was not given to us in the question! As such, we can only make an educated guess.
For example, guessing $c=-1/5$ implies (by ordinary calculus) that $w_1 = 3/58$ is a minimizer of the variance.
$endgroup$
$begingroup$
Thanks a lot for this detailed answer, I understand the concept of weighted least squares a lot better now! However, I'm still unclear as to how to assign the weights properly. Perhaps you could tell me the procedure for the problem mentioned in my question or point me in the right direction?
$endgroup$
– Cyanide2002
Feb 1 at 10:04
$begingroup$
I'm not sure, I copied the textbook example. I was thinking that it is possible to find the weights if we knew each error and it's probability. Am I wrong?
$endgroup$
– Cyanide2002
Feb 2 at 2:17
$begingroup$
I think I understand the question. I added it to my answer above.
$endgroup$
– parsiad
Feb 2 at 4:18
$begingroup$
What does $ operatorname{Cov}(y_{1}-beta^{star},y_{2}-beta^{star})$ mean? Could you please clarify?
$endgroup$
– Cyanide2002
Feb 3 at 15:24
$begingroup$
$text{Cov}(A,B)$ is the covariance of $A$ and $B$: en.wikipedia.org/wiki/Covariance
$endgroup$
– parsiad
Feb 3 at 21:58
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3094925%2fweighted-least-squares%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
It's probably easiest to understand weighted linear least squares (LLS) by explaining
the motivation behind "ordinary" (i.e., non-weighted) LLS first.
Ordinary LLS
The setting is as follows: you are given measurements $(x_{1},y_{1}),ldots,(x_{N},y_{N})$ where $x_{n}inmathbb{R}^{d}$ and $y_{n}inmathbb{R}$.
You are asked to find an affine function $f$ such that $f(x_{n})approx y_{n}$ for each $n$.
The hope is that this affine function can give you good estimates of the output parameter $y$ for arbitrary inputs $x$.
One way to find such a function is to pick it such that "mean squared error"
$$
text{MSE}equivfrac{1}{N}sum_{n}left(f(x_{n})-y_{n}right)^{2}
$$
is minimized.
Since $f$ is assumed to be affine, it has the form $f(x)=beta_{0}+x^{intercal}beta_{1}$.
Plugging this into the above,
$$
text{MSE}=frac{1}{N}sum_{n}left(beta_{0}+x_{n}^{intercal}beta_{1}-y_{n}right)^{2}.
$$
Let's take a short detour and rewrite the MSE in terms of matrices and vectors (this well help us in taking derivatives in the next paragraph).
In order to do so, let
$$
Xequivbegin{pmatrix}1 & x_{1}^{intercal}\
1 & x_{2}^{intercal}\
vdots & vdots\
1 & x_{n}^{intercal}
end{pmatrix}text{, }yequivbegin{pmatrix}y_{1}\
y_{2}\
vdots\
y_{n}
end{pmatrix}text{,}text{ and }betaequivbegin{pmatrix}beta_{0}\
beta_{1}
end{pmatrix}.
$$
Then,
$$
text{MSE}=frac{1}{N}Vert Xbeta-yVert^{2}=frac{1}{N}left(Xbeta-yright)^{intercal}left(Xbeta-yright).
$$
Recall from calculus that the minimum of the MSE occurs at points where its derivative is zero.
Using matrix calculus, the derivative is
$$
frac{partialtext{MSE}}{partialbeta}=frac{2}{N}frac{partial}{partialbeta}left[Xbeta-yright]left(Xbeta-yright)=frac{2}{N}X^{intercal}left(Xbeta-yright).
$$
Setting this to zero, we get the equation
$$
X^{intercal}Xbeta=Xy.
$$
Defining $Aequiv X^{intercal}X$ and $bequiv Xy$, the above becomes the square linear system $Abeta=b$.
You can solve this with ordinary tools from linear algebra when $A$ is nonsingular.
You can also solve it when $A$ is singular but doing so requires a notion of pseudo-inverse, which is more advanced material that you can safely ignore for now.
Weighted LLS
What is weighted LLS?
In weighted LLS, you assign a "belief" to each measurement $(x_{n},y_{n})$.
This allows you to value certain measurements more than others.
The beliefs are positive numbers $w_{1},ldots,w_{N}$.
The larger $w_{n}$, the more you believe in the measurement $(x_{n},y_{n})$.
The "weighted mean squared error" is
$$
text{WMSE}equivfrac{1}{N}sum_{n}w_{n}left(beta_0 + x_n^intercal beta_1-y_{n}right)^{2}.
$$
Defining the diagonal matrix
$$
W^{frac{1}{2}}=begin{pmatrix}sqrt{w_{1}}\
& sqrt{w_{2}}\
& & ddots\
& & & sqrt{w_{N}}
end{pmatrix},
$$
we can rewrite the WMSE in the efficient form
$$
text{WMSE}=frac{1}{N}left(W^{frac{1}{2}}left(Xbeta-yright)right)^{intercal}left(W^{frac{1}{2}}left(Xbeta-yright)right).
$$
Taking the derivative,
$$
frac{partialtext{WMSE}}{partialbeta}=frac{2}{N}frac{partial}{partialbeta}left[W^{frac{1}{2}}left(Xbeta-yright)right]W^{frac{1}{2}}left(Xbeta-yright)=frac{2}{N}X^{intercal}Wleft(Xbeta-yright).
$$
Setting this to zero, we get the equation
$$
X^{intercal}WXbeta=X^{intercal}Wy.
$$
Defining $A^{(w)}equiv X^{intercal}WX$ and $b^{(w)}=X^{intercal}Wy$, the above becomes the square linear system $A^{(w)}beta=b^{(w)}$.
As usual, you can tackle this with ordinary linear algebra.
Since the weights are positive and the matrix $W$ is diagonal, the column space of $X^intercal$ and $X^intercal W$ are the same.
Professor's age
We are given two guesses $y_1$ and $y_2$ of the professor's age and asked to produce our own final guess $beta$ by choosing an appropriate weights $w_1$ and $w_2$.
The MSE in this case is
$$
text{MSE} equiv frac{1}{2} left[ w_1 left(beta - y_1right)^2 + w_2 left(beta - y_2right)^2 right].
$$
Defining $X = (1, 1)^intercal$, $y = (y_1, y_2)^intercal$ and $W = operatorname{diag}(sqrt{w_1}, sqrt{w_2})$, the arguments above imply that
$$
beta = (X^intercal W X)^{-1} X^intercal W y
$$
minimizes the MSE.
You can check that the above is equivalent to
$$
beta = frac{w_1 y_1 + w_2 y_2}{w_1 + w_2}.
$$
Without loss of generality, we can pick $w_2 = 1$ so that
$$
beta = frac{w_1 y_1 + y_2}{1 + w_1}.
$$
Next, let $beta^star$ be the professors true age.
Note that
$$
beta - beta^star = frac{w_1 left(y_1 - beta^starright) + left(y_2 - beta^starright)}{1 + w_1}.
$$
Let $y_1$ be the student's guess and $y_2$ be the professor's.
As per the question statement, both are unbiased estimators of the professor's age:
begin{align*}
mathbb{E}[y_1 - beta^star] & = -2 frac{1}{2} - 1 frac{1}{4} + 5 frac{1}{4} = 0\
mathbb{E}[y_2 - beta^star] & = -1 frac{1}{8} + 1 frac{1}{8} = 0
end{align*}
Therefore, $mathbb{E}[beta - beta^star] = 0$, and the expected value cannot help us pick the weights.
Therefore, we look to the variance:
$$
operatorname{Var}(beta-beta^{star})=frac{1}{left(1+w_{1}right)^{2}}left(w_{1}^{2}operatorname{Var}(y_{1}-beta^{star})+operatorname{Var}(y_{2}-beta^{star})+2w_{1}operatorname{Cov}(y_{1}-beta^{star},y_{2}-beta^{star})right).
$$
Note that
begin{align*}
operatorname{Var}(y_1 - beta^star) & = (-2)^2 frac{1}{2} + (-1)^2 frac{1}{4} + 5^2 frac{1}{4} = frac{17}{2} \
operatorname{Var}(y_2 - beta^star) & = (-1)^2 frac{1}{8} + 1^2 frac{1}{8} = frac{1}{4}.
end{align*}
For brevity, let $c equiv operatorname{Cov}(y_{1}-beta^{star},y_{2}-beta^{star})$.
Plugging these values back into the variance equation,
$$
operatorname{Var}(beta-beta^{star})=frac{1}{left(1+w_{1}right)^{2}}left(w_{1}^{2}frac{17}{2}+frac{1}{4}+2w_{1}cright).
$$
To pick the "best" weight $w_1$, we try to minimize the variance of $beta - beta^star$.
However, in order to do so, we need the quantity $c$, which was not given to us in the question! As such, we can only make an educated guess.
For example, guessing $c=-1/5$ implies (by ordinary calculus) that $w_1 = 3/58$ is a minimizer of the variance.
$endgroup$
$begingroup$
Thanks a lot for this detailed answer, I understand the concept of weighted least squares a lot better now! However, I'm still unclear as to how to assign the weights properly. Perhaps you could tell me the procedure for the problem mentioned in my question or point me in the right direction?
$endgroup$
– Cyanide2002
Feb 1 at 10:04
$begingroup$
I'm not sure, I copied the textbook example. I was thinking that it is possible to find the weights if we knew each error and it's probability. Am I wrong?
$endgroup$
– Cyanide2002
Feb 2 at 2:17
$begingroup$
I think I understand the question. I added it to my answer above.
$endgroup$
– parsiad
Feb 2 at 4:18
$begingroup$
What does $ operatorname{Cov}(y_{1}-beta^{star},y_{2}-beta^{star})$ mean? Could you please clarify?
$endgroup$
– Cyanide2002
Feb 3 at 15:24
$begingroup$
$text{Cov}(A,B)$ is the covariance of $A$ and $B$: en.wikipedia.org/wiki/Covariance
$endgroup$
– parsiad
Feb 3 at 21:58
add a comment |
$begingroup$
It's probably easiest to understand weighted linear least squares (LLS) by explaining
the motivation behind "ordinary" (i.e., non-weighted) LLS first.
Ordinary LLS
The setting is as follows: you are given measurements $(x_{1},y_{1}),ldots,(x_{N},y_{N})$ where $x_{n}inmathbb{R}^{d}$ and $y_{n}inmathbb{R}$.
You are asked to find an affine function $f$ such that $f(x_{n})approx y_{n}$ for each $n$.
The hope is that this affine function can give you good estimates of the output parameter $y$ for arbitrary inputs $x$.
One way to find such a function is to pick it such that "mean squared error"
$$
text{MSE}equivfrac{1}{N}sum_{n}left(f(x_{n})-y_{n}right)^{2}
$$
is minimized.
Since $f$ is assumed to be affine, it has the form $f(x)=beta_{0}+x^{intercal}beta_{1}$.
Plugging this into the above,
$$
text{MSE}=frac{1}{N}sum_{n}left(beta_{0}+x_{n}^{intercal}beta_{1}-y_{n}right)^{2}.
$$
Let's take a short detour and rewrite the MSE in terms of matrices and vectors (this well help us in taking derivatives in the next paragraph).
In order to do so, let
$$
Xequivbegin{pmatrix}1 & x_{1}^{intercal}\
1 & x_{2}^{intercal}\
vdots & vdots\
1 & x_{n}^{intercal}
end{pmatrix}text{, }yequivbegin{pmatrix}y_{1}\
y_{2}\
vdots\
y_{n}
end{pmatrix}text{,}text{ and }betaequivbegin{pmatrix}beta_{0}\
beta_{1}
end{pmatrix}.
$$
Then,
$$
text{MSE}=frac{1}{N}Vert Xbeta-yVert^{2}=frac{1}{N}left(Xbeta-yright)^{intercal}left(Xbeta-yright).
$$
Recall from calculus that the minimum of the MSE occurs at points where its derivative is zero.
Using matrix calculus, the derivative is
$$
frac{partialtext{MSE}}{partialbeta}=frac{2}{N}frac{partial}{partialbeta}left[Xbeta-yright]left(Xbeta-yright)=frac{2}{N}X^{intercal}left(Xbeta-yright).
$$
Setting this to zero, we get the equation
$$
X^{intercal}Xbeta=Xy.
$$
Defining $Aequiv X^{intercal}X$ and $bequiv Xy$, the above becomes the square linear system $Abeta=b$.
You can solve this with ordinary tools from linear algebra when $A$ is nonsingular.
You can also solve it when $A$ is singular but doing so requires a notion of pseudo-inverse, which is more advanced material that you can safely ignore for now.
Weighted LLS
What is weighted LLS?
In weighted LLS, you assign a "belief" to each measurement $(x_{n},y_{n})$.
This allows you to value certain measurements more than others.
The beliefs are positive numbers $w_{1},ldots,w_{N}$.
The larger $w_{n}$, the more you believe in the measurement $(x_{n},y_{n})$.
The "weighted mean squared error" is
$$
text{WMSE}equivfrac{1}{N}sum_{n}w_{n}left(beta_0 + x_n^intercal beta_1-y_{n}right)^{2}.
$$
Defining the diagonal matrix
$$
W^{frac{1}{2}}=begin{pmatrix}sqrt{w_{1}}\
& sqrt{w_{2}}\
& & ddots\
& & & sqrt{w_{N}}
end{pmatrix},
$$
we can rewrite the WMSE in the efficient form
$$
text{WMSE}=frac{1}{N}left(W^{frac{1}{2}}left(Xbeta-yright)right)^{intercal}left(W^{frac{1}{2}}left(Xbeta-yright)right).
$$
Taking the derivative,
$$
frac{partialtext{WMSE}}{partialbeta}=frac{2}{N}frac{partial}{partialbeta}left[W^{frac{1}{2}}left(Xbeta-yright)right]W^{frac{1}{2}}left(Xbeta-yright)=frac{2}{N}X^{intercal}Wleft(Xbeta-yright).
$$
Setting this to zero, we get the equation
$$
X^{intercal}WXbeta=X^{intercal}Wy.
$$
Defining $A^{(w)}equiv X^{intercal}WX$ and $b^{(w)}=X^{intercal}Wy$, the above becomes the square linear system $A^{(w)}beta=b^{(w)}$.
As usual, you can tackle this with ordinary linear algebra.
Since the weights are positive and the matrix $W$ is diagonal, the column space of $X^intercal$ and $X^intercal W$ are the same.
Professor's age
We are given two guesses $y_1$ and $y_2$ of the professor's age and asked to produce our own final guess $beta$ by choosing an appropriate weights $w_1$ and $w_2$.
The MSE in this case is
$$
text{MSE} equiv frac{1}{2} left[ w_1 left(beta - y_1right)^2 + w_2 left(beta - y_2right)^2 right].
$$
Defining $X = (1, 1)^intercal$, $y = (y_1, y_2)^intercal$ and $W = operatorname{diag}(sqrt{w_1}, sqrt{w_2})$, the arguments above imply that
$$
beta = (X^intercal W X)^{-1} X^intercal W y
$$
minimizes the MSE.
You can check that the above is equivalent to
$$
beta = frac{w_1 y_1 + w_2 y_2}{w_1 + w_2}.
$$
Without loss of generality, we can pick $w_2 = 1$ so that
$$
beta = frac{w_1 y_1 + y_2}{1 + w_1}.
$$
Next, let $beta^star$ be the professors true age.
Note that
$$
beta - beta^star = frac{w_1 left(y_1 - beta^starright) + left(y_2 - beta^starright)}{1 + w_1}.
$$
Let $y_1$ be the student's guess and $y_2$ be the professor's.
As per the question statement, both are unbiased estimators of the professor's age:
begin{align*}
mathbb{E}[y_1 - beta^star] & = -2 frac{1}{2} - 1 frac{1}{4} + 5 frac{1}{4} = 0\
mathbb{E}[y_2 - beta^star] & = -1 frac{1}{8} + 1 frac{1}{8} = 0
end{align*}
Therefore, $mathbb{E}[beta - beta^star] = 0$, and the expected value cannot help us pick the weights.
Therefore, we look to the variance:
$$
operatorname{Var}(beta-beta^{star})=frac{1}{left(1+w_{1}right)^{2}}left(w_{1}^{2}operatorname{Var}(y_{1}-beta^{star})+operatorname{Var}(y_{2}-beta^{star})+2w_{1}operatorname{Cov}(y_{1}-beta^{star},y_{2}-beta^{star})right).
$$
Note that
begin{align*}
operatorname{Var}(y_1 - beta^star) & = (-2)^2 frac{1}{2} + (-1)^2 frac{1}{4} + 5^2 frac{1}{4} = frac{17}{2} \
operatorname{Var}(y_2 - beta^star) & = (-1)^2 frac{1}{8} + 1^2 frac{1}{8} = frac{1}{4}.
end{align*}
For brevity, let $c equiv operatorname{Cov}(y_{1}-beta^{star},y_{2}-beta^{star})$.
Plugging these values back into the variance equation,
$$
operatorname{Var}(beta-beta^{star})=frac{1}{left(1+w_{1}right)^{2}}left(w_{1}^{2}frac{17}{2}+frac{1}{4}+2w_{1}cright).
$$
To pick the "best" weight $w_1$, we try to minimize the variance of $beta - beta^star$.
However, in order to do so, we need the quantity $c$, which was not given to us in the question! As such, we can only make an educated guess.
For example, guessing $c=-1/5$ implies (by ordinary calculus) that $w_1 = 3/58$ is a minimizer of the variance.
$endgroup$
$begingroup$
Thanks a lot for this detailed answer, I understand the concept of weighted least squares a lot better now! However, I'm still unclear as to how to assign the weights properly. Perhaps you could tell me the procedure for the problem mentioned in my question or point me in the right direction?
$endgroup$
– Cyanide2002
Feb 1 at 10:04
$begingroup$
I'm not sure, I copied the textbook example. I was thinking that it is possible to find the weights if we knew each error and it's probability. Am I wrong?
$endgroup$
– Cyanide2002
Feb 2 at 2:17
$begingroup$
I think I understand the question. I added it to my answer above.
$endgroup$
– parsiad
Feb 2 at 4:18
$begingroup$
What does $ operatorname{Cov}(y_{1}-beta^{star},y_{2}-beta^{star})$ mean? Could you please clarify?
$endgroup$
– Cyanide2002
Feb 3 at 15:24
$begingroup$
$text{Cov}(A,B)$ is the covariance of $A$ and $B$: en.wikipedia.org/wiki/Covariance
$endgroup$
– parsiad
Feb 3 at 21:58
add a comment |
$begingroup$
It's probably easiest to understand weighted linear least squares (LLS) by explaining
the motivation behind "ordinary" (i.e., non-weighted) LLS first.
Ordinary LLS
The setting is as follows: you are given measurements $(x_{1},y_{1}),ldots,(x_{N},y_{N})$ where $x_{n}inmathbb{R}^{d}$ and $y_{n}inmathbb{R}$.
You are asked to find an affine function $f$ such that $f(x_{n})approx y_{n}$ for each $n$.
The hope is that this affine function can give you good estimates of the output parameter $y$ for arbitrary inputs $x$.
One way to find such a function is to pick it such that "mean squared error"
$$
text{MSE}equivfrac{1}{N}sum_{n}left(f(x_{n})-y_{n}right)^{2}
$$
is minimized.
Since $f$ is assumed to be affine, it has the form $f(x)=beta_{0}+x^{intercal}beta_{1}$.
Plugging this into the above,
$$
text{MSE}=frac{1}{N}sum_{n}left(beta_{0}+x_{n}^{intercal}beta_{1}-y_{n}right)^{2}.
$$
Let's take a short detour and rewrite the MSE in terms of matrices and vectors (this well help us in taking derivatives in the next paragraph).
In order to do so, let
$$
Xequivbegin{pmatrix}1 & x_{1}^{intercal}\
1 & x_{2}^{intercal}\
vdots & vdots\
1 & x_{n}^{intercal}
end{pmatrix}text{, }yequivbegin{pmatrix}y_{1}\
y_{2}\
vdots\
y_{n}
end{pmatrix}text{,}text{ and }betaequivbegin{pmatrix}beta_{0}\
beta_{1}
end{pmatrix}.
$$
Then,
$$
text{MSE}=frac{1}{N}Vert Xbeta-yVert^{2}=frac{1}{N}left(Xbeta-yright)^{intercal}left(Xbeta-yright).
$$
Recall from calculus that the minimum of the MSE occurs at points where its derivative is zero.
Using matrix calculus, the derivative is
$$
frac{partialtext{MSE}}{partialbeta}=frac{2}{N}frac{partial}{partialbeta}left[Xbeta-yright]left(Xbeta-yright)=frac{2}{N}X^{intercal}left(Xbeta-yright).
$$
Setting this to zero, we get the equation
$$
X^{intercal}Xbeta=Xy.
$$
Defining $Aequiv X^{intercal}X$ and $bequiv Xy$, the above becomes the square linear system $Abeta=b$.
You can solve this with ordinary tools from linear algebra when $A$ is nonsingular.
You can also solve it when $A$ is singular but doing so requires a notion of pseudo-inverse, which is more advanced material that you can safely ignore for now.
Weighted LLS
What is weighted LLS?
In weighted LLS, you assign a "belief" to each measurement $(x_{n},y_{n})$.
This allows you to value certain measurements more than others.
The beliefs are positive numbers $w_{1},ldots,w_{N}$.
The larger $w_{n}$, the more you believe in the measurement $(x_{n},y_{n})$.
The "weighted mean squared error" is
$$
text{WMSE}equivfrac{1}{N}sum_{n}w_{n}left(beta_0 + x_n^intercal beta_1-y_{n}right)^{2}.
$$
Defining the diagonal matrix
$$
W^{frac{1}{2}}=begin{pmatrix}sqrt{w_{1}}\
& sqrt{w_{2}}\
& & ddots\
& & & sqrt{w_{N}}
end{pmatrix},
$$
we can rewrite the WMSE in the efficient form
$$
text{WMSE}=frac{1}{N}left(W^{frac{1}{2}}left(Xbeta-yright)right)^{intercal}left(W^{frac{1}{2}}left(Xbeta-yright)right).
$$
Taking the derivative,
$$
frac{partialtext{WMSE}}{partialbeta}=frac{2}{N}frac{partial}{partialbeta}left[W^{frac{1}{2}}left(Xbeta-yright)right]W^{frac{1}{2}}left(Xbeta-yright)=frac{2}{N}X^{intercal}Wleft(Xbeta-yright).
$$
Setting this to zero, we get the equation
$$
X^{intercal}WXbeta=X^{intercal}Wy.
$$
Defining $A^{(w)}equiv X^{intercal}WX$ and $b^{(w)}=X^{intercal}Wy$, the above becomes the square linear system $A^{(w)}beta=b^{(w)}$.
As usual, you can tackle this with ordinary linear algebra.
Since the weights are positive and the matrix $W$ is diagonal, the column space of $X^intercal$ and $X^intercal W$ are the same.
Professor's age
We are given two guesses $y_1$ and $y_2$ of the professor's age and asked to produce our own final guess $beta$ by choosing an appropriate weights $w_1$ and $w_2$.
The MSE in this case is
$$
text{MSE} equiv frac{1}{2} left[ w_1 left(beta - y_1right)^2 + w_2 left(beta - y_2right)^2 right].
$$
Defining $X = (1, 1)^intercal$, $y = (y_1, y_2)^intercal$ and $W = operatorname{diag}(sqrt{w_1}, sqrt{w_2})$, the arguments above imply that
$$
beta = (X^intercal W X)^{-1} X^intercal W y
$$
minimizes the MSE.
You can check that the above is equivalent to
$$
beta = frac{w_1 y_1 + w_2 y_2}{w_1 + w_2}.
$$
Without loss of generality, we can pick $w_2 = 1$ so that
$$
beta = frac{w_1 y_1 + y_2}{1 + w_1}.
$$
Next, let $beta^star$ be the professors true age.
Note that
$$
beta - beta^star = frac{w_1 left(y_1 - beta^starright) + left(y_2 - beta^starright)}{1 + w_1}.
$$
Let $y_1$ be the student's guess and $y_2$ be the professor's.
As per the question statement, both are unbiased estimators of the professor's age:
begin{align*}
mathbb{E}[y_1 - beta^star] & = -2 frac{1}{2} - 1 frac{1}{4} + 5 frac{1}{4} = 0\
mathbb{E}[y_2 - beta^star] & = -1 frac{1}{8} + 1 frac{1}{8} = 0
end{align*}
Therefore, $mathbb{E}[beta - beta^star] = 0$, and the expected value cannot help us pick the weights.
Therefore, we look to the variance:
$$
operatorname{Var}(beta-beta^{star})=frac{1}{left(1+w_{1}right)^{2}}left(w_{1}^{2}operatorname{Var}(y_{1}-beta^{star})+operatorname{Var}(y_{2}-beta^{star})+2w_{1}operatorname{Cov}(y_{1}-beta^{star},y_{2}-beta^{star})right).
$$
Note that
begin{align*}
operatorname{Var}(y_1 - beta^star) & = (-2)^2 frac{1}{2} + (-1)^2 frac{1}{4} + 5^2 frac{1}{4} = frac{17}{2} \
operatorname{Var}(y_2 - beta^star) & = (-1)^2 frac{1}{8} + 1^2 frac{1}{8} = frac{1}{4}.
end{align*}
For brevity, let $c equiv operatorname{Cov}(y_{1}-beta^{star},y_{2}-beta^{star})$.
Plugging these values back into the variance equation,
$$
operatorname{Var}(beta-beta^{star})=frac{1}{left(1+w_{1}right)^{2}}left(w_{1}^{2}frac{17}{2}+frac{1}{4}+2w_{1}cright).
$$
To pick the "best" weight $w_1$, we try to minimize the variance of $beta - beta^star$.
However, in order to do so, we need the quantity $c$, which was not given to us in the question! As such, we can only make an educated guess.
For example, guessing $c=-1/5$ implies (by ordinary calculus) that $w_1 = 3/58$ is a minimizer of the variance.
$endgroup$
It's probably easiest to understand weighted linear least squares (LLS) by explaining
the motivation behind "ordinary" (i.e., non-weighted) LLS first.
Ordinary LLS
The setting is as follows: you are given measurements $(x_{1},y_{1}),ldots,(x_{N},y_{N})$ where $x_{n}inmathbb{R}^{d}$ and $y_{n}inmathbb{R}$.
You are asked to find an affine function $f$ such that $f(x_{n})approx y_{n}$ for each $n$.
The hope is that this affine function can give you good estimates of the output parameter $y$ for arbitrary inputs $x$.
One way to find such a function is to pick it such that "mean squared error"
$$
text{MSE}equivfrac{1}{N}sum_{n}left(f(x_{n})-y_{n}right)^{2}
$$
is minimized.
Since $f$ is assumed to be affine, it has the form $f(x)=beta_{0}+x^{intercal}beta_{1}$.
Plugging this into the above,
$$
text{MSE}=frac{1}{N}sum_{n}left(beta_{0}+x_{n}^{intercal}beta_{1}-y_{n}right)^{2}.
$$
Let's take a short detour and rewrite the MSE in terms of matrices and vectors (this well help us in taking derivatives in the next paragraph).
In order to do so, let
$$
Xequivbegin{pmatrix}1 & x_{1}^{intercal}\
1 & x_{2}^{intercal}\
vdots & vdots\
1 & x_{n}^{intercal}
end{pmatrix}text{, }yequivbegin{pmatrix}y_{1}\
y_{2}\
vdots\
y_{n}
end{pmatrix}text{,}text{ and }betaequivbegin{pmatrix}beta_{0}\
beta_{1}
end{pmatrix}.
$$
Then,
$$
text{MSE}=frac{1}{N}Vert Xbeta-yVert^{2}=frac{1}{N}left(Xbeta-yright)^{intercal}left(Xbeta-yright).
$$
Recall from calculus that the minimum of the MSE occurs at points where its derivative is zero.
Using matrix calculus, the derivative is
$$
frac{partialtext{MSE}}{partialbeta}=frac{2}{N}frac{partial}{partialbeta}left[Xbeta-yright]left(Xbeta-yright)=frac{2}{N}X^{intercal}left(Xbeta-yright).
$$
Setting this to zero, we get the equation
$$
X^{intercal}Xbeta=Xy.
$$
Defining $Aequiv X^{intercal}X$ and $bequiv Xy$, the above becomes the square linear system $Abeta=b$.
You can solve this with ordinary tools from linear algebra when $A$ is nonsingular.
You can also solve it when $A$ is singular but doing so requires a notion of pseudo-inverse, which is more advanced material that you can safely ignore for now.
Weighted LLS
What is weighted LLS?
In weighted LLS, you assign a "belief" to each measurement $(x_{n},y_{n})$.
This allows you to value certain measurements more than others.
The beliefs are positive numbers $w_{1},ldots,w_{N}$.
The larger $w_{n}$, the more you believe in the measurement $(x_{n},y_{n})$.
The "weighted mean squared error" is
$$
text{WMSE}equivfrac{1}{N}sum_{n}w_{n}left(beta_0 + x_n^intercal beta_1-y_{n}right)^{2}.
$$
Defining the diagonal matrix
$$
W^{frac{1}{2}}=begin{pmatrix}sqrt{w_{1}}\
& sqrt{w_{2}}\
& & ddots\
& & & sqrt{w_{N}}
end{pmatrix},
$$
we can rewrite the WMSE in the efficient form
$$
text{WMSE}=frac{1}{N}left(W^{frac{1}{2}}left(Xbeta-yright)right)^{intercal}left(W^{frac{1}{2}}left(Xbeta-yright)right).
$$
Taking the derivative,
$$
frac{partialtext{WMSE}}{partialbeta}=frac{2}{N}frac{partial}{partialbeta}left[W^{frac{1}{2}}left(Xbeta-yright)right]W^{frac{1}{2}}left(Xbeta-yright)=frac{2}{N}X^{intercal}Wleft(Xbeta-yright).
$$
Setting this to zero, we get the equation
$$
X^{intercal}WXbeta=X^{intercal}Wy.
$$
Defining $A^{(w)}equiv X^{intercal}WX$ and $b^{(w)}=X^{intercal}Wy$, the above becomes the square linear system $A^{(w)}beta=b^{(w)}$.
As usual, you can tackle this with ordinary linear algebra.
Since the weights are positive and the matrix $W$ is diagonal, the column space of $X^intercal$ and $X^intercal W$ are the same.
Professor's age
We are given two guesses $y_1$ and $y_2$ of the professor's age and asked to produce our own final guess $beta$ by choosing an appropriate weights $w_1$ and $w_2$.
The MSE in this case is
$$
text{MSE} equiv frac{1}{2} left[ w_1 left(beta - y_1right)^2 + w_2 left(beta - y_2right)^2 right].
$$
Defining $X = (1, 1)^intercal$, $y = (y_1, y_2)^intercal$ and $W = operatorname{diag}(sqrt{w_1}, sqrt{w_2})$, the arguments above imply that
$$
beta = (X^intercal W X)^{-1} X^intercal W y
$$
minimizes the MSE.
You can check that the above is equivalent to
$$
beta = frac{w_1 y_1 + w_2 y_2}{w_1 + w_2}.
$$
Without loss of generality, we can pick $w_2 = 1$ so that
$$
beta = frac{w_1 y_1 + y_2}{1 + w_1}.
$$
Next, let $beta^star$ be the professors true age.
Note that
$$
beta - beta^star = frac{w_1 left(y_1 - beta^starright) + left(y_2 - beta^starright)}{1 + w_1}.
$$
Let $y_1$ be the student's guess and $y_2$ be the professor's.
As per the question statement, both are unbiased estimators of the professor's age:
begin{align*}
mathbb{E}[y_1 - beta^star] & = -2 frac{1}{2} - 1 frac{1}{4} + 5 frac{1}{4} = 0\
mathbb{E}[y_2 - beta^star] & = -1 frac{1}{8} + 1 frac{1}{8} = 0
end{align*}
Therefore, $mathbb{E}[beta - beta^star] = 0$, and the expected value cannot help us pick the weights.
Therefore, we look to the variance:
$$
operatorname{Var}(beta-beta^{star})=frac{1}{left(1+w_{1}right)^{2}}left(w_{1}^{2}operatorname{Var}(y_{1}-beta^{star})+operatorname{Var}(y_{2}-beta^{star})+2w_{1}operatorname{Cov}(y_{1}-beta^{star},y_{2}-beta^{star})right).
$$
Note that
begin{align*}
operatorname{Var}(y_1 - beta^star) & = (-2)^2 frac{1}{2} + (-1)^2 frac{1}{4} + 5^2 frac{1}{4} = frac{17}{2} \
operatorname{Var}(y_2 - beta^star) & = (-1)^2 frac{1}{8} + 1^2 frac{1}{8} = frac{1}{4}.
end{align*}
For brevity, let $c equiv operatorname{Cov}(y_{1}-beta^{star},y_{2}-beta^{star})$.
Plugging these values back into the variance equation,
$$
operatorname{Var}(beta-beta^{star})=frac{1}{left(1+w_{1}right)^{2}}left(w_{1}^{2}frac{17}{2}+frac{1}{4}+2w_{1}cright).
$$
To pick the "best" weight $w_1$, we try to minimize the variance of $beta - beta^star$.
However, in order to do so, we need the quantity $c$, which was not given to us in the question! As such, we can only make an educated guess.
For example, guessing $c=-1/5$ implies (by ordinary calculus) that $w_1 = 3/58$ is a minimizer of the variance.
edited Feb 2 at 4:17
answered Jan 31 at 17:40
parsiadparsiad
18.6k32453
18.6k32453
$begingroup$
Thanks a lot for this detailed answer, I understand the concept of weighted least squares a lot better now! However, I'm still unclear as to how to assign the weights properly. Perhaps you could tell me the procedure for the problem mentioned in my question or point me in the right direction?
$endgroup$
– Cyanide2002
Feb 1 at 10:04
$begingroup$
I'm not sure, I copied the textbook example. I was thinking that it is possible to find the weights if we knew each error and it's probability. Am I wrong?
$endgroup$
– Cyanide2002
Feb 2 at 2:17
$begingroup$
I think I understand the question. I added it to my answer above.
$endgroup$
– parsiad
Feb 2 at 4:18
$begingroup$
What does $ operatorname{Cov}(y_{1}-beta^{star},y_{2}-beta^{star})$ mean? Could you please clarify?
$endgroup$
– Cyanide2002
Feb 3 at 15:24
$begingroup$
$text{Cov}(A,B)$ is the covariance of $A$ and $B$: en.wikipedia.org/wiki/Covariance
$endgroup$
– parsiad
Feb 3 at 21:58
add a comment |
$begingroup$
Thanks a lot for this detailed answer, I understand the concept of weighted least squares a lot better now! However, I'm still unclear as to how to assign the weights properly. Perhaps you could tell me the procedure for the problem mentioned in my question or point me in the right direction?
$endgroup$
– Cyanide2002
Feb 1 at 10:04
$begingroup$
I'm not sure, I copied the textbook example. I was thinking that it is possible to find the weights if we knew each error and it's probability. Am I wrong?
$endgroup$
– Cyanide2002
Feb 2 at 2:17
$begingroup$
I think I understand the question. I added it to my answer above.
$endgroup$
– parsiad
Feb 2 at 4:18
$begingroup$
What does $ operatorname{Cov}(y_{1}-beta^{star},y_{2}-beta^{star})$ mean? Could you please clarify?
$endgroup$
– Cyanide2002
Feb 3 at 15:24
$begingroup$
$text{Cov}(A,B)$ is the covariance of $A$ and $B$: en.wikipedia.org/wiki/Covariance
$endgroup$
– parsiad
Feb 3 at 21:58
$begingroup$
Thanks a lot for this detailed answer, I understand the concept of weighted least squares a lot better now! However, I'm still unclear as to how to assign the weights properly. Perhaps you could tell me the procedure for the problem mentioned in my question or point me in the right direction?
$endgroup$
– Cyanide2002
Feb 1 at 10:04
$begingroup$
Thanks a lot for this detailed answer, I understand the concept of weighted least squares a lot better now! However, I'm still unclear as to how to assign the weights properly. Perhaps you could tell me the procedure for the problem mentioned in my question or point me in the right direction?
$endgroup$
– Cyanide2002
Feb 1 at 10:04
$begingroup$
I'm not sure, I copied the textbook example. I was thinking that it is possible to find the weights if we knew each error and it's probability. Am I wrong?
$endgroup$
– Cyanide2002
Feb 2 at 2:17
$begingroup$
I'm not sure, I copied the textbook example. I was thinking that it is possible to find the weights if we knew each error and it's probability. Am I wrong?
$endgroup$
– Cyanide2002
Feb 2 at 2:17
$begingroup$
I think I understand the question. I added it to my answer above.
$endgroup$
– parsiad
Feb 2 at 4:18
$begingroup$
I think I understand the question. I added it to my answer above.
$endgroup$
– parsiad
Feb 2 at 4:18
$begingroup$
What does $ operatorname{Cov}(y_{1}-beta^{star},y_{2}-beta^{star})$ mean? Could you please clarify?
$endgroup$
– Cyanide2002
Feb 3 at 15:24
$begingroup$
What does $ operatorname{Cov}(y_{1}-beta^{star},y_{2}-beta^{star})$ mean? Could you please clarify?
$endgroup$
– Cyanide2002
Feb 3 at 15:24
$begingroup$
$text{Cov}(A,B)$ is the covariance of $A$ and $B$: en.wikipedia.org/wiki/Covariance
$endgroup$
– parsiad
Feb 3 at 21:58
$begingroup$
$text{Cov}(A,B)$ is the covariance of $A$ and $B$: en.wikipedia.org/wiki/Covariance
$endgroup$
– parsiad
Feb 3 at 21:58
add a comment |
Thanks for contributing an answer to Mathematics Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3094925%2fweighted-least-squares%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
I believe the matrix $W$ is a diagonal matrix and is essentially a matrix of weights. Perhaps post more context from the textbook so I can say for sure (or include the textbook reference, page number and name of book).
$endgroup$
– JEET TRIVEDI
Jan 31 at 15:38