Why Does SVD Provide the Least Squares and Least Norm Solution to $ A x = b $?
I am studying the Singular Value Decomposition and its properties. It is widely used in order to solve equations of the form $Ax=b$. I have seen the following: When we have the equation system $Ax=b$, we calculate the SVD of A as $A=USigma V^T$. Then we calculate $x'= V Sigma^{+}U^Tb$. $Sigma^{+}$ has the reciprocals ($dfrac{1}{sigma_i}$) of the singular values in its diagonal and zeros where $sigma_i=0$. If the $b$ is in the range of $A$ then it is the solution which has the minimum norm (closest to origin). If it is not in the range, then it is the least squares solution.
I fail to see how exactly this procedure always produce a $x'$ which is closest to origin if $b$ is in the range of A. (I can see the least squares solution is an extension of this "closest to origin" property). From a geometric intuitive way if possible, how can we show this property of SVD?
linear-algebra optimization svd least-squares
add a comment |
I am studying the Singular Value Decomposition and its properties. It is widely used in order to solve equations of the form $Ax=b$. I have seen the following: When we have the equation system $Ax=b$, we calculate the SVD of A as $A=USigma V^T$. Then we calculate $x'= V Sigma^{+}U^Tb$. $Sigma^{+}$ has the reciprocals ($dfrac{1}{sigma_i}$) of the singular values in its diagonal and zeros where $sigma_i=0$. If the $b$ is in the range of $A$ then it is the solution which has the minimum norm (closest to origin). If it is not in the range, then it is the least squares solution.
I fail to see how exactly this procedure always produce a $x'$ which is closest to origin if $b$ is in the range of A. (I can see the least squares solution is an extension of this "closest to origin" property). From a geometric intuitive way if possible, how can we show this property of SVD?
linear-algebra optimization svd least-squares
Proof and deep analysis can be found in my Singular Value Decomposition (SVD) Presentation. Specifically this issue is on pages 50-60.
– Royi
Jun 16 '18 at 18:11
add a comment |
I am studying the Singular Value Decomposition and its properties. It is widely used in order to solve equations of the form $Ax=b$. I have seen the following: When we have the equation system $Ax=b$, we calculate the SVD of A as $A=USigma V^T$. Then we calculate $x'= V Sigma^{+}U^Tb$. $Sigma^{+}$ has the reciprocals ($dfrac{1}{sigma_i}$) of the singular values in its diagonal and zeros where $sigma_i=0$. If the $b$ is in the range of $A$ then it is the solution which has the minimum norm (closest to origin). If it is not in the range, then it is the least squares solution.
I fail to see how exactly this procedure always produce a $x'$ which is closest to origin if $b$ is in the range of A. (I can see the least squares solution is an extension of this "closest to origin" property). From a geometric intuitive way if possible, how can we show this property of SVD?
linear-algebra optimization svd least-squares
I am studying the Singular Value Decomposition and its properties. It is widely used in order to solve equations of the form $Ax=b$. I have seen the following: When we have the equation system $Ax=b$, we calculate the SVD of A as $A=USigma V^T$. Then we calculate $x'= V Sigma^{+}U^Tb$. $Sigma^{+}$ has the reciprocals ($dfrac{1}{sigma_i}$) of the singular values in its diagonal and zeros where $sigma_i=0$. If the $b$ is in the range of $A$ then it is the solution which has the minimum norm (closest to origin). If it is not in the range, then it is the least squares solution.
I fail to see how exactly this procedure always produce a $x'$ which is closest to origin if $b$ is in the range of A. (I can see the least squares solution is an extension of this "closest to origin" property). From a geometric intuitive way if possible, how can we show this property of SVD?
linear-algebra optimization svd least-squares
linear-algebra optimization svd least-squares
edited Aug 21 '17 at 21:05
Royi
3,31512351
3,31512351
asked Oct 15 '14 at 0:19


Ufuk Can Bicici
1,21811027
1,21811027
Proof and deep analysis can be found in my Singular Value Decomposition (SVD) Presentation. Specifically this issue is on pages 50-60.
– Royi
Jun 16 '18 at 18:11
add a comment |
Proof and deep analysis can be found in my Singular Value Decomposition (SVD) Presentation. Specifically this issue is on pages 50-60.
– Royi
Jun 16 '18 at 18:11
Proof and deep analysis can be found in my Singular Value Decomposition (SVD) Presentation. Specifically this issue is on pages 50-60.
– Royi
Jun 16 '18 at 18:11
Proof and deep analysis can be found in my Singular Value Decomposition (SVD) Presentation. Specifically this issue is on pages 50-60.
– Royi
Jun 16 '18 at 18:11
add a comment |
4 Answers
4
active
oldest
votes
First, consider the problem $Sigma x = b$, where
$$
Sigma = pmatrix{sigma_1\& ddots\&&sigma_r\ &&&0\&&&&ddots\&&&&&0}
$$
Note that $b$ is only in the range of $Sigma$ if its entries $b_{r+1},dots,b_n$ are all zero. Furthermore, you should be able to convince yourself (geometrically or otherwise) that the least squares solution must be
$$
x = (b_1/sigma_1,dots,b_r/sigma_r,0,dots,0)^T = Sigma^+ b
$$
From there, note that
$$
USigma V^T x = b implies\
Sigma (V^T x ) = U^T b
$$
By the above argument, the least squares solution for $(V^T x)$ is given by
$V^T x = Sigma^+ U^T b$. Noting that $|V^T x| = |x|$, we can use this to conclude that $x = (V Sigma ^+ U^T)b$ must be the least squares solution (for $x$).
I hope you find this explanation sufficient.
A question: In the beginning did we assumed that $Sigma $ is a nxn square matrix? What is the reason that the entries from $ n+1$ to $ n$ should be zero, such that $ b $ is in the range of $Sigma $?
– Ufuk Can Bicici
Oct 15 '14 at 5:27
Wait one minute; you mean $ b $ for being its entries from $ r+1$ to $ n $ as zero, right?
– Ufuk Can Bicici
Oct 15 '14 at 5:40
For the moment, I assume that $Sigma$ is a nxn square, diagonal matrix with rank $r$, which means it has $r$ nonzero entries in it diagonals. (Correct me if I am wrong, I am not in the best terms with linear algebra :) ) Then $Sigma x = b$ has only a singe unique solution, which is, as you have written $x = (b_1/sigma_1,dots,b_r/sigma_r,0,dots,0)^T$. Now since we have only a single solution, how can we talk about a least squarest solution as well? Unfortunately I became heavily confused .
– Ufuk Can Bicici
Oct 15 '14 at 9:51
1
For convenience, I was indeed talking about square matrices. If $Sigma$ has rank $r<n$ and $b$ is in the range of $Sigma,$ then of course $Sigma x = b$ has infinitely many solutions. What do those solutions look like? Why is the one I presented the least squares solution?
– Omnomnomnom
Oct 15 '14 at 11:18
1
What if $x_{r+1},dots,x_{n}$ are non-zero?
– Omnomnomnom
Oct 15 '14 at 11:48
|
show 1 more comment
The pseudoinverse solution from the SVD is derived in proving standard least square problem with SVD.
Given $mathbf{A}x=b$, where the data vector $bnotinmathcal{N}left( mathbf{A}^{*} right)$, the least squares solution exists and is given by
$$
x_{LS} = color{blue}{mathbf{A}^{dagger}b} + color{red}{left( mathbf{I}_{n} - mathbf{A}^{dagger}mathbf{A}right) y}, quad yinmathbb{C}^{n}
$$
where blue vectors are in the range space $color{blue}{mathcal{R}left( mathbf{A}^{*} right)}$ and red vectors are in the null space $color{red}{mathcal{N}left( mathbf{A} right)}.$
The least squares solution $r^{2}$ minimizes the sum of the squares of the residual errors and is an affine space. That is
$$
lVert
mathbf{A} x_{LS} (y)
rVert_{2}^{2} = r^{2}_{min}
$$
for all values of $y$.
What is the vector in this affine space with the smallest length? The length of the solution vectors is
$$
lVert
x_{LS}
rVert_{2}^{2}
=
lVert
color{blue}{mathbf{A}^{dagger}b} +
color{red}{left( mathbf{I}_{n} - mathbf{A}^{dagger}mathbf{A}right) y}
rVert_{2}^{2}
=
lVert
color{blue}{mathbf{A}^{dagger}b}
rVert_{2}^{2}
+
underbrace{lVert
color{red}{left( mathbf{I}_{n} - mathbf{A}^{dagger}mathbf{A}right) y}
rVert_{2}^{2}}_{y=mathbf{0}}
$$
The solution vector of minimum length is $color{blue}{mathbf{A}^{dagger}b}$, the point in the affine space closest to the origin.
are you using $A^*$ or $A^dagger$ to denote the Hermitian conjugate?
– glS
Nov 16 '18 at 13:20
@glS. The asterisk denotes the Hermitian conjugate; the dagger the Moore-Penrose pseudoinverse. Thanks to your question notational errors have been corrected.
– dantopa
Nov 19 '18 at 23:40
add a comment |
Let's fix the dimensions for the sake of making the example simpler and say that $A:mathbb{R}^8tomathbb{R}^5$ and that $rank(A)=3$.
$A=USigma V^T$ is the singular value decomposition of $A$ and $e_1, ..., e_5$ and $f_1, ..., f_8$ are the standard bases of $mathbb{R}^5$ and $mathbb{R}^8$ (respectively), i.e. $e_1 = [1, 0, 0, 0, 0]^T$ etc.
So $A(mathbb{R}^8)$ is a 3-dimensional subspace of $mathbb{R}^5$. What are the (or some) geometric interpretations of $U$, $Sigma$ and $V$?
$U$ sends the basis vectors $e_i$ to the column vectors $Ue_i$ of $U$ which give us an orthogonal basis in $mathbb{R}^5$ such that the first three column vectors span $Im(A)$. You can see this by noting that
$USigma f_1 = Usigma_1 e_1 ne 0$
$USigma f_2 = Usigma_2 e_2 ne 0$
$USigma f_3 = Usigma_3 e_3 ne 0$
$USigma f_4 = ... = USigma f_8 = 0$
Since $U$ is orthogonal its inverse is $U^T$.
Similarly $V$ sends the basis vectors $f_i$ to the column vectors $Vf_i$ of $V$ which give us an orthogonal basis in $mathbb{R}^8$ such that the last 5 span $ker(A)$:
$AVf_i = USigma V^TVf_i = USigma f_i ne 0$ for $i$ = 1, 2, 3
$AVf_i = USigma V^TVf_i = USigma f_i = 0$ for $i$ = 4 .. 8
And the inverse of $V$ is $V^T$.
$Sigma$ jams $mathbb{R}^8$ into $mathbb{R}^5$by mapping the one-dimensional spaces spanned by each of $f_1, f_2, f_3$ onto those spanned by $e_1, e_2, e_3$ (scaling them by $sigma_1, sigma_2, sigma_3$ in the process) while squashing those spanned by $f_4..f_8$.
It follows that $A$ jams $mathbb{R}^8$ into $mathbb{R}^5$ by mapping the one-dimensional spaces spanned by each of $Vf_1, Vf_2, Vf_3$ onto those spanned by $Ue_1, Ue_2, Ue_3$ (scaling them by $sigma_1, sigma_2, sigma_3$ in the process) while squashing those spanned by $Vf_4..Vf_8$.
The key here is that restricted to the space spanned by $Vf_1, Vf_2, Vf_3$ A is an isomorphism onto the space spanned by $Ue_1, Ue_2, Ue_3$, and that $VSigma^+U^T$ is the inverse (when restricted to $Ue_1, Ue_2, Ue_3$).
If we have $AX = b$ and $b in Im(A)$ it follows that there exists a unique $x' in <Vf_1, Vf_2, Vf_3>$ s.t. $Ax' = b$. Any other solution $x$ in $mathbb{R}^8$ takes the form $x' + delta$ for $delta in Ker(A)$. Now since we can decompose $mathbb{R}^8$ into $<Vf_1, Vf_2, Vf_3> oplus <Vf_4, .., Vf_8>$ we have $lvert xrvert^2 = <x' + delta, x' + delta> = lvert x'rvert^2 + lvert deltarvert^2$ and so $lvert xrvert >= lvert x'rvert$ - that is, $x'$ is the closest solution to the origin.
add a comment |
The resource linked below really helped me understand this. The transformation $A$ can be interpreted in 2D as mapping the unit circle to an elipse. This can be done in a 3 step process using the SVD:
- Rotate the unit circle so it can be stretched along its axis
- Stretch each axis to form the ellipse
- Rotate again to align the ellipse with the output space of $A$
To solve for $x$, you reverse this process, starting with $b$.
Least squares comes in when step 2 creates a ellipse with a width of zero. When you're going through this process in reverse, when you get to step 2, un-stretching throws away that dimension with a width of zero. Still, you're left with the closest point to $b$ in the output space of $A$. Continuing through the reversed process gets you to $x'$.
In other words, the transformation $A$ maps the unit circle to a line instead of an ellipse, and you've found the $x$ for which $Ax$ results in the closest point on that line to point $b$.
https://www.cs.cornell.edu/courses/cs3220/2010sp/notes/svd.pdf
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f974193%2fwhy-does-svd-provide-the-least-squares-and-least-norm-solution-to-a-x-b%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
First, consider the problem $Sigma x = b$, where
$$
Sigma = pmatrix{sigma_1\& ddots\&&sigma_r\ &&&0\&&&&ddots\&&&&&0}
$$
Note that $b$ is only in the range of $Sigma$ if its entries $b_{r+1},dots,b_n$ are all zero. Furthermore, you should be able to convince yourself (geometrically or otherwise) that the least squares solution must be
$$
x = (b_1/sigma_1,dots,b_r/sigma_r,0,dots,0)^T = Sigma^+ b
$$
From there, note that
$$
USigma V^T x = b implies\
Sigma (V^T x ) = U^T b
$$
By the above argument, the least squares solution for $(V^T x)$ is given by
$V^T x = Sigma^+ U^T b$. Noting that $|V^T x| = |x|$, we can use this to conclude that $x = (V Sigma ^+ U^T)b$ must be the least squares solution (for $x$).
I hope you find this explanation sufficient.
A question: In the beginning did we assumed that $Sigma $ is a nxn square matrix? What is the reason that the entries from $ n+1$ to $ n$ should be zero, such that $ b $ is in the range of $Sigma $?
– Ufuk Can Bicici
Oct 15 '14 at 5:27
Wait one minute; you mean $ b $ for being its entries from $ r+1$ to $ n $ as zero, right?
– Ufuk Can Bicici
Oct 15 '14 at 5:40
For the moment, I assume that $Sigma$ is a nxn square, diagonal matrix with rank $r$, which means it has $r$ nonzero entries in it diagonals. (Correct me if I am wrong, I am not in the best terms with linear algebra :) ) Then $Sigma x = b$ has only a singe unique solution, which is, as you have written $x = (b_1/sigma_1,dots,b_r/sigma_r,0,dots,0)^T$. Now since we have only a single solution, how can we talk about a least squarest solution as well? Unfortunately I became heavily confused .
– Ufuk Can Bicici
Oct 15 '14 at 9:51
1
For convenience, I was indeed talking about square matrices. If $Sigma$ has rank $r<n$ and $b$ is in the range of $Sigma,$ then of course $Sigma x = b$ has infinitely many solutions. What do those solutions look like? Why is the one I presented the least squares solution?
– Omnomnomnom
Oct 15 '14 at 11:18
1
What if $x_{r+1},dots,x_{n}$ are non-zero?
– Omnomnomnom
Oct 15 '14 at 11:48
|
show 1 more comment
First, consider the problem $Sigma x = b$, where
$$
Sigma = pmatrix{sigma_1\& ddots\&&sigma_r\ &&&0\&&&&ddots\&&&&&0}
$$
Note that $b$ is only in the range of $Sigma$ if its entries $b_{r+1},dots,b_n$ are all zero. Furthermore, you should be able to convince yourself (geometrically or otherwise) that the least squares solution must be
$$
x = (b_1/sigma_1,dots,b_r/sigma_r,0,dots,0)^T = Sigma^+ b
$$
From there, note that
$$
USigma V^T x = b implies\
Sigma (V^T x ) = U^T b
$$
By the above argument, the least squares solution for $(V^T x)$ is given by
$V^T x = Sigma^+ U^T b$. Noting that $|V^T x| = |x|$, we can use this to conclude that $x = (V Sigma ^+ U^T)b$ must be the least squares solution (for $x$).
I hope you find this explanation sufficient.
A question: In the beginning did we assumed that $Sigma $ is a nxn square matrix? What is the reason that the entries from $ n+1$ to $ n$ should be zero, such that $ b $ is in the range of $Sigma $?
– Ufuk Can Bicici
Oct 15 '14 at 5:27
Wait one minute; you mean $ b $ for being its entries from $ r+1$ to $ n $ as zero, right?
– Ufuk Can Bicici
Oct 15 '14 at 5:40
For the moment, I assume that $Sigma$ is a nxn square, diagonal matrix with rank $r$, which means it has $r$ nonzero entries in it diagonals. (Correct me if I am wrong, I am not in the best terms with linear algebra :) ) Then $Sigma x = b$ has only a singe unique solution, which is, as you have written $x = (b_1/sigma_1,dots,b_r/sigma_r,0,dots,0)^T$. Now since we have only a single solution, how can we talk about a least squarest solution as well? Unfortunately I became heavily confused .
– Ufuk Can Bicici
Oct 15 '14 at 9:51
1
For convenience, I was indeed talking about square matrices. If $Sigma$ has rank $r<n$ and $b$ is in the range of $Sigma,$ then of course $Sigma x = b$ has infinitely many solutions. What do those solutions look like? Why is the one I presented the least squares solution?
– Omnomnomnom
Oct 15 '14 at 11:18
1
What if $x_{r+1},dots,x_{n}$ are non-zero?
– Omnomnomnom
Oct 15 '14 at 11:48
|
show 1 more comment
First, consider the problem $Sigma x = b$, where
$$
Sigma = pmatrix{sigma_1\& ddots\&&sigma_r\ &&&0\&&&&ddots\&&&&&0}
$$
Note that $b$ is only in the range of $Sigma$ if its entries $b_{r+1},dots,b_n$ are all zero. Furthermore, you should be able to convince yourself (geometrically or otherwise) that the least squares solution must be
$$
x = (b_1/sigma_1,dots,b_r/sigma_r,0,dots,0)^T = Sigma^+ b
$$
From there, note that
$$
USigma V^T x = b implies\
Sigma (V^T x ) = U^T b
$$
By the above argument, the least squares solution for $(V^T x)$ is given by
$V^T x = Sigma^+ U^T b$. Noting that $|V^T x| = |x|$, we can use this to conclude that $x = (V Sigma ^+ U^T)b$ must be the least squares solution (for $x$).
I hope you find this explanation sufficient.
First, consider the problem $Sigma x = b$, where
$$
Sigma = pmatrix{sigma_1\& ddots\&&sigma_r\ &&&0\&&&&ddots\&&&&&0}
$$
Note that $b$ is only in the range of $Sigma$ if its entries $b_{r+1},dots,b_n$ are all zero. Furthermore, you should be able to convince yourself (geometrically or otherwise) that the least squares solution must be
$$
x = (b_1/sigma_1,dots,b_r/sigma_r,0,dots,0)^T = Sigma^+ b
$$
From there, note that
$$
USigma V^T x = b implies\
Sigma (V^T x ) = U^T b
$$
By the above argument, the least squares solution for $(V^T x)$ is given by
$V^T x = Sigma^+ U^T b$. Noting that $|V^T x| = |x|$, we can use this to conclude that $x = (V Sigma ^+ U^T)b$ must be the least squares solution (for $x$).
I hope you find this explanation sufficient.
edited Oct 15 '14 at 11:19
answered Oct 15 '14 at 3:13
Omnomnomnom
126k788176
126k788176
A question: In the beginning did we assumed that $Sigma $ is a nxn square matrix? What is the reason that the entries from $ n+1$ to $ n$ should be zero, such that $ b $ is in the range of $Sigma $?
– Ufuk Can Bicici
Oct 15 '14 at 5:27
Wait one minute; you mean $ b $ for being its entries from $ r+1$ to $ n $ as zero, right?
– Ufuk Can Bicici
Oct 15 '14 at 5:40
For the moment, I assume that $Sigma$ is a nxn square, diagonal matrix with rank $r$, which means it has $r$ nonzero entries in it diagonals. (Correct me if I am wrong, I am not in the best terms with linear algebra :) ) Then $Sigma x = b$ has only a singe unique solution, which is, as you have written $x = (b_1/sigma_1,dots,b_r/sigma_r,0,dots,0)^T$. Now since we have only a single solution, how can we talk about a least squarest solution as well? Unfortunately I became heavily confused .
– Ufuk Can Bicici
Oct 15 '14 at 9:51
1
For convenience, I was indeed talking about square matrices. If $Sigma$ has rank $r<n$ and $b$ is in the range of $Sigma,$ then of course $Sigma x = b$ has infinitely many solutions. What do those solutions look like? Why is the one I presented the least squares solution?
– Omnomnomnom
Oct 15 '14 at 11:18
1
What if $x_{r+1},dots,x_{n}$ are non-zero?
– Omnomnomnom
Oct 15 '14 at 11:48
|
show 1 more comment
A question: In the beginning did we assumed that $Sigma $ is a nxn square matrix? What is the reason that the entries from $ n+1$ to $ n$ should be zero, such that $ b $ is in the range of $Sigma $?
– Ufuk Can Bicici
Oct 15 '14 at 5:27
Wait one minute; you mean $ b $ for being its entries from $ r+1$ to $ n $ as zero, right?
– Ufuk Can Bicici
Oct 15 '14 at 5:40
For the moment, I assume that $Sigma$ is a nxn square, diagonal matrix with rank $r$, which means it has $r$ nonzero entries in it diagonals. (Correct me if I am wrong, I am not in the best terms with linear algebra :) ) Then $Sigma x = b$ has only a singe unique solution, which is, as you have written $x = (b_1/sigma_1,dots,b_r/sigma_r,0,dots,0)^T$. Now since we have only a single solution, how can we talk about a least squarest solution as well? Unfortunately I became heavily confused .
– Ufuk Can Bicici
Oct 15 '14 at 9:51
1
For convenience, I was indeed talking about square matrices. If $Sigma$ has rank $r<n$ and $b$ is in the range of $Sigma,$ then of course $Sigma x = b$ has infinitely many solutions. What do those solutions look like? Why is the one I presented the least squares solution?
– Omnomnomnom
Oct 15 '14 at 11:18
1
What if $x_{r+1},dots,x_{n}$ are non-zero?
– Omnomnomnom
Oct 15 '14 at 11:48
A question: In the beginning did we assumed that $Sigma $ is a nxn square matrix? What is the reason that the entries from $ n+1$ to $ n$ should be zero, such that $ b $ is in the range of $Sigma $?
– Ufuk Can Bicici
Oct 15 '14 at 5:27
A question: In the beginning did we assumed that $Sigma $ is a nxn square matrix? What is the reason that the entries from $ n+1$ to $ n$ should be zero, such that $ b $ is in the range of $Sigma $?
– Ufuk Can Bicici
Oct 15 '14 at 5:27
Wait one minute; you mean $ b $ for being its entries from $ r+1$ to $ n $ as zero, right?
– Ufuk Can Bicici
Oct 15 '14 at 5:40
Wait one minute; you mean $ b $ for being its entries from $ r+1$ to $ n $ as zero, right?
– Ufuk Can Bicici
Oct 15 '14 at 5:40
For the moment, I assume that $Sigma$ is a nxn square, diagonal matrix with rank $r$, which means it has $r$ nonzero entries in it diagonals. (Correct me if I am wrong, I am not in the best terms with linear algebra :) ) Then $Sigma x = b$ has only a singe unique solution, which is, as you have written $x = (b_1/sigma_1,dots,b_r/sigma_r,0,dots,0)^T$. Now since we have only a single solution, how can we talk about a least squarest solution as well? Unfortunately I became heavily confused .
– Ufuk Can Bicici
Oct 15 '14 at 9:51
For the moment, I assume that $Sigma$ is a nxn square, diagonal matrix with rank $r$, which means it has $r$ nonzero entries in it diagonals. (Correct me if I am wrong, I am not in the best terms with linear algebra :) ) Then $Sigma x = b$ has only a singe unique solution, which is, as you have written $x = (b_1/sigma_1,dots,b_r/sigma_r,0,dots,0)^T$. Now since we have only a single solution, how can we talk about a least squarest solution as well? Unfortunately I became heavily confused .
– Ufuk Can Bicici
Oct 15 '14 at 9:51
1
1
For convenience, I was indeed talking about square matrices. If $Sigma$ has rank $r<n$ and $b$ is in the range of $Sigma,$ then of course $Sigma x = b$ has infinitely many solutions. What do those solutions look like? Why is the one I presented the least squares solution?
– Omnomnomnom
Oct 15 '14 at 11:18
For convenience, I was indeed talking about square matrices. If $Sigma$ has rank $r<n$ and $b$ is in the range of $Sigma,$ then of course $Sigma x = b$ has infinitely many solutions. What do those solutions look like? Why is the one I presented the least squares solution?
– Omnomnomnom
Oct 15 '14 at 11:18
1
1
What if $x_{r+1},dots,x_{n}$ are non-zero?
– Omnomnomnom
Oct 15 '14 at 11:48
What if $x_{r+1},dots,x_{n}$ are non-zero?
– Omnomnomnom
Oct 15 '14 at 11:48
|
show 1 more comment
The pseudoinverse solution from the SVD is derived in proving standard least square problem with SVD.
Given $mathbf{A}x=b$, where the data vector $bnotinmathcal{N}left( mathbf{A}^{*} right)$, the least squares solution exists and is given by
$$
x_{LS} = color{blue}{mathbf{A}^{dagger}b} + color{red}{left( mathbf{I}_{n} - mathbf{A}^{dagger}mathbf{A}right) y}, quad yinmathbb{C}^{n}
$$
where blue vectors are in the range space $color{blue}{mathcal{R}left( mathbf{A}^{*} right)}$ and red vectors are in the null space $color{red}{mathcal{N}left( mathbf{A} right)}.$
The least squares solution $r^{2}$ minimizes the sum of the squares of the residual errors and is an affine space. That is
$$
lVert
mathbf{A} x_{LS} (y)
rVert_{2}^{2} = r^{2}_{min}
$$
for all values of $y$.
What is the vector in this affine space with the smallest length? The length of the solution vectors is
$$
lVert
x_{LS}
rVert_{2}^{2}
=
lVert
color{blue}{mathbf{A}^{dagger}b} +
color{red}{left( mathbf{I}_{n} - mathbf{A}^{dagger}mathbf{A}right) y}
rVert_{2}^{2}
=
lVert
color{blue}{mathbf{A}^{dagger}b}
rVert_{2}^{2}
+
underbrace{lVert
color{red}{left( mathbf{I}_{n} - mathbf{A}^{dagger}mathbf{A}right) y}
rVert_{2}^{2}}_{y=mathbf{0}}
$$
The solution vector of minimum length is $color{blue}{mathbf{A}^{dagger}b}$, the point in the affine space closest to the origin.
are you using $A^*$ or $A^dagger$ to denote the Hermitian conjugate?
– glS
Nov 16 '18 at 13:20
@glS. The asterisk denotes the Hermitian conjugate; the dagger the Moore-Penrose pseudoinverse. Thanks to your question notational errors have been corrected.
– dantopa
Nov 19 '18 at 23:40
add a comment |
The pseudoinverse solution from the SVD is derived in proving standard least square problem with SVD.
Given $mathbf{A}x=b$, where the data vector $bnotinmathcal{N}left( mathbf{A}^{*} right)$, the least squares solution exists and is given by
$$
x_{LS} = color{blue}{mathbf{A}^{dagger}b} + color{red}{left( mathbf{I}_{n} - mathbf{A}^{dagger}mathbf{A}right) y}, quad yinmathbb{C}^{n}
$$
where blue vectors are in the range space $color{blue}{mathcal{R}left( mathbf{A}^{*} right)}$ and red vectors are in the null space $color{red}{mathcal{N}left( mathbf{A} right)}.$
The least squares solution $r^{2}$ minimizes the sum of the squares of the residual errors and is an affine space. That is
$$
lVert
mathbf{A} x_{LS} (y)
rVert_{2}^{2} = r^{2}_{min}
$$
for all values of $y$.
What is the vector in this affine space with the smallest length? The length of the solution vectors is
$$
lVert
x_{LS}
rVert_{2}^{2}
=
lVert
color{blue}{mathbf{A}^{dagger}b} +
color{red}{left( mathbf{I}_{n} - mathbf{A}^{dagger}mathbf{A}right) y}
rVert_{2}^{2}
=
lVert
color{blue}{mathbf{A}^{dagger}b}
rVert_{2}^{2}
+
underbrace{lVert
color{red}{left( mathbf{I}_{n} - mathbf{A}^{dagger}mathbf{A}right) y}
rVert_{2}^{2}}_{y=mathbf{0}}
$$
The solution vector of minimum length is $color{blue}{mathbf{A}^{dagger}b}$, the point in the affine space closest to the origin.
are you using $A^*$ or $A^dagger$ to denote the Hermitian conjugate?
– glS
Nov 16 '18 at 13:20
@glS. The asterisk denotes the Hermitian conjugate; the dagger the Moore-Penrose pseudoinverse. Thanks to your question notational errors have been corrected.
– dantopa
Nov 19 '18 at 23:40
add a comment |
The pseudoinverse solution from the SVD is derived in proving standard least square problem with SVD.
Given $mathbf{A}x=b$, where the data vector $bnotinmathcal{N}left( mathbf{A}^{*} right)$, the least squares solution exists and is given by
$$
x_{LS} = color{blue}{mathbf{A}^{dagger}b} + color{red}{left( mathbf{I}_{n} - mathbf{A}^{dagger}mathbf{A}right) y}, quad yinmathbb{C}^{n}
$$
where blue vectors are in the range space $color{blue}{mathcal{R}left( mathbf{A}^{*} right)}$ and red vectors are in the null space $color{red}{mathcal{N}left( mathbf{A} right)}.$
The least squares solution $r^{2}$ minimizes the sum of the squares of the residual errors and is an affine space. That is
$$
lVert
mathbf{A} x_{LS} (y)
rVert_{2}^{2} = r^{2}_{min}
$$
for all values of $y$.
What is the vector in this affine space with the smallest length? The length of the solution vectors is
$$
lVert
x_{LS}
rVert_{2}^{2}
=
lVert
color{blue}{mathbf{A}^{dagger}b} +
color{red}{left( mathbf{I}_{n} - mathbf{A}^{dagger}mathbf{A}right) y}
rVert_{2}^{2}
=
lVert
color{blue}{mathbf{A}^{dagger}b}
rVert_{2}^{2}
+
underbrace{lVert
color{red}{left( mathbf{I}_{n} - mathbf{A}^{dagger}mathbf{A}right) y}
rVert_{2}^{2}}_{y=mathbf{0}}
$$
The solution vector of minimum length is $color{blue}{mathbf{A}^{dagger}b}$, the point in the affine space closest to the origin.
The pseudoinverse solution from the SVD is derived in proving standard least square problem with SVD.
Given $mathbf{A}x=b$, where the data vector $bnotinmathcal{N}left( mathbf{A}^{*} right)$, the least squares solution exists and is given by
$$
x_{LS} = color{blue}{mathbf{A}^{dagger}b} + color{red}{left( mathbf{I}_{n} - mathbf{A}^{dagger}mathbf{A}right) y}, quad yinmathbb{C}^{n}
$$
where blue vectors are in the range space $color{blue}{mathcal{R}left( mathbf{A}^{*} right)}$ and red vectors are in the null space $color{red}{mathcal{N}left( mathbf{A} right)}.$
The least squares solution $r^{2}$ minimizes the sum of the squares of the residual errors and is an affine space. That is
$$
lVert
mathbf{A} x_{LS} (y)
rVert_{2}^{2} = r^{2}_{min}
$$
for all values of $y$.
What is the vector in this affine space with the smallest length? The length of the solution vectors is
$$
lVert
x_{LS}
rVert_{2}^{2}
=
lVert
color{blue}{mathbf{A}^{dagger}b} +
color{red}{left( mathbf{I}_{n} - mathbf{A}^{dagger}mathbf{A}right) y}
rVert_{2}^{2}
=
lVert
color{blue}{mathbf{A}^{dagger}b}
rVert_{2}^{2}
+
underbrace{lVert
color{red}{left( mathbf{I}_{n} - mathbf{A}^{dagger}mathbf{A}right) y}
rVert_{2}^{2}}_{y=mathbf{0}}
$$
The solution vector of minimum length is $color{blue}{mathbf{A}^{dagger}b}$, the point in the affine space closest to the origin.
edited Nov 19 '18 at 23:38
answered Mar 8 '17 at 2:01
dantopa
6,41932042
6,41932042
are you using $A^*$ or $A^dagger$ to denote the Hermitian conjugate?
– glS
Nov 16 '18 at 13:20
@glS. The asterisk denotes the Hermitian conjugate; the dagger the Moore-Penrose pseudoinverse. Thanks to your question notational errors have been corrected.
– dantopa
Nov 19 '18 at 23:40
add a comment |
are you using $A^*$ or $A^dagger$ to denote the Hermitian conjugate?
– glS
Nov 16 '18 at 13:20
@glS. The asterisk denotes the Hermitian conjugate; the dagger the Moore-Penrose pseudoinverse. Thanks to your question notational errors have been corrected.
– dantopa
Nov 19 '18 at 23:40
are you using $A^*$ or $A^dagger$ to denote the Hermitian conjugate?
– glS
Nov 16 '18 at 13:20
are you using $A^*$ or $A^dagger$ to denote the Hermitian conjugate?
– glS
Nov 16 '18 at 13:20
@glS. The asterisk denotes the Hermitian conjugate; the dagger the Moore-Penrose pseudoinverse. Thanks to your question notational errors have been corrected.
– dantopa
Nov 19 '18 at 23:40
@glS. The asterisk denotes the Hermitian conjugate; the dagger the Moore-Penrose pseudoinverse. Thanks to your question notational errors have been corrected.
– dantopa
Nov 19 '18 at 23:40
add a comment |
Let's fix the dimensions for the sake of making the example simpler and say that $A:mathbb{R}^8tomathbb{R}^5$ and that $rank(A)=3$.
$A=USigma V^T$ is the singular value decomposition of $A$ and $e_1, ..., e_5$ and $f_1, ..., f_8$ are the standard bases of $mathbb{R}^5$ and $mathbb{R}^8$ (respectively), i.e. $e_1 = [1, 0, 0, 0, 0]^T$ etc.
So $A(mathbb{R}^8)$ is a 3-dimensional subspace of $mathbb{R}^5$. What are the (or some) geometric interpretations of $U$, $Sigma$ and $V$?
$U$ sends the basis vectors $e_i$ to the column vectors $Ue_i$ of $U$ which give us an orthogonal basis in $mathbb{R}^5$ such that the first three column vectors span $Im(A)$. You can see this by noting that
$USigma f_1 = Usigma_1 e_1 ne 0$
$USigma f_2 = Usigma_2 e_2 ne 0$
$USigma f_3 = Usigma_3 e_3 ne 0$
$USigma f_4 = ... = USigma f_8 = 0$
Since $U$ is orthogonal its inverse is $U^T$.
Similarly $V$ sends the basis vectors $f_i$ to the column vectors $Vf_i$ of $V$ which give us an orthogonal basis in $mathbb{R}^8$ such that the last 5 span $ker(A)$:
$AVf_i = USigma V^TVf_i = USigma f_i ne 0$ for $i$ = 1, 2, 3
$AVf_i = USigma V^TVf_i = USigma f_i = 0$ for $i$ = 4 .. 8
And the inverse of $V$ is $V^T$.
$Sigma$ jams $mathbb{R}^8$ into $mathbb{R}^5$by mapping the one-dimensional spaces spanned by each of $f_1, f_2, f_3$ onto those spanned by $e_1, e_2, e_3$ (scaling them by $sigma_1, sigma_2, sigma_3$ in the process) while squashing those spanned by $f_4..f_8$.
It follows that $A$ jams $mathbb{R}^8$ into $mathbb{R}^5$ by mapping the one-dimensional spaces spanned by each of $Vf_1, Vf_2, Vf_3$ onto those spanned by $Ue_1, Ue_2, Ue_3$ (scaling them by $sigma_1, sigma_2, sigma_3$ in the process) while squashing those spanned by $Vf_4..Vf_8$.
The key here is that restricted to the space spanned by $Vf_1, Vf_2, Vf_3$ A is an isomorphism onto the space spanned by $Ue_1, Ue_2, Ue_3$, and that $VSigma^+U^T$ is the inverse (when restricted to $Ue_1, Ue_2, Ue_3$).
If we have $AX = b$ and $b in Im(A)$ it follows that there exists a unique $x' in <Vf_1, Vf_2, Vf_3>$ s.t. $Ax' = b$. Any other solution $x$ in $mathbb{R}^8$ takes the form $x' + delta$ for $delta in Ker(A)$. Now since we can decompose $mathbb{R}^8$ into $<Vf_1, Vf_2, Vf_3> oplus <Vf_4, .., Vf_8>$ we have $lvert xrvert^2 = <x' + delta, x' + delta> = lvert x'rvert^2 + lvert deltarvert^2$ and so $lvert xrvert >= lvert x'rvert$ - that is, $x'$ is the closest solution to the origin.
add a comment |
Let's fix the dimensions for the sake of making the example simpler and say that $A:mathbb{R}^8tomathbb{R}^5$ and that $rank(A)=3$.
$A=USigma V^T$ is the singular value decomposition of $A$ and $e_1, ..., e_5$ and $f_1, ..., f_8$ are the standard bases of $mathbb{R}^5$ and $mathbb{R}^8$ (respectively), i.e. $e_1 = [1, 0, 0, 0, 0]^T$ etc.
So $A(mathbb{R}^8)$ is a 3-dimensional subspace of $mathbb{R}^5$. What are the (or some) geometric interpretations of $U$, $Sigma$ and $V$?
$U$ sends the basis vectors $e_i$ to the column vectors $Ue_i$ of $U$ which give us an orthogonal basis in $mathbb{R}^5$ such that the first three column vectors span $Im(A)$. You can see this by noting that
$USigma f_1 = Usigma_1 e_1 ne 0$
$USigma f_2 = Usigma_2 e_2 ne 0$
$USigma f_3 = Usigma_3 e_3 ne 0$
$USigma f_4 = ... = USigma f_8 = 0$
Since $U$ is orthogonal its inverse is $U^T$.
Similarly $V$ sends the basis vectors $f_i$ to the column vectors $Vf_i$ of $V$ which give us an orthogonal basis in $mathbb{R}^8$ such that the last 5 span $ker(A)$:
$AVf_i = USigma V^TVf_i = USigma f_i ne 0$ for $i$ = 1, 2, 3
$AVf_i = USigma V^TVf_i = USigma f_i = 0$ for $i$ = 4 .. 8
And the inverse of $V$ is $V^T$.
$Sigma$ jams $mathbb{R}^8$ into $mathbb{R}^5$by mapping the one-dimensional spaces spanned by each of $f_1, f_2, f_3$ onto those spanned by $e_1, e_2, e_3$ (scaling them by $sigma_1, sigma_2, sigma_3$ in the process) while squashing those spanned by $f_4..f_8$.
It follows that $A$ jams $mathbb{R}^8$ into $mathbb{R}^5$ by mapping the one-dimensional spaces spanned by each of $Vf_1, Vf_2, Vf_3$ onto those spanned by $Ue_1, Ue_2, Ue_3$ (scaling them by $sigma_1, sigma_2, sigma_3$ in the process) while squashing those spanned by $Vf_4..Vf_8$.
The key here is that restricted to the space spanned by $Vf_1, Vf_2, Vf_3$ A is an isomorphism onto the space spanned by $Ue_1, Ue_2, Ue_3$, and that $VSigma^+U^T$ is the inverse (when restricted to $Ue_1, Ue_2, Ue_3$).
If we have $AX = b$ and $b in Im(A)$ it follows that there exists a unique $x' in <Vf_1, Vf_2, Vf_3>$ s.t. $Ax' = b$. Any other solution $x$ in $mathbb{R}^8$ takes the form $x' + delta$ for $delta in Ker(A)$. Now since we can decompose $mathbb{R}^8$ into $<Vf_1, Vf_2, Vf_3> oplus <Vf_4, .., Vf_8>$ we have $lvert xrvert^2 = <x' + delta, x' + delta> = lvert x'rvert^2 + lvert deltarvert^2$ and so $lvert xrvert >= lvert x'rvert$ - that is, $x'$ is the closest solution to the origin.
add a comment |
Let's fix the dimensions for the sake of making the example simpler and say that $A:mathbb{R}^8tomathbb{R}^5$ and that $rank(A)=3$.
$A=USigma V^T$ is the singular value decomposition of $A$ and $e_1, ..., e_5$ and $f_1, ..., f_8$ are the standard bases of $mathbb{R}^5$ and $mathbb{R}^8$ (respectively), i.e. $e_1 = [1, 0, 0, 0, 0]^T$ etc.
So $A(mathbb{R}^8)$ is a 3-dimensional subspace of $mathbb{R}^5$. What are the (or some) geometric interpretations of $U$, $Sigma$ and $V$?
$U$ sends the basis vectors $e_i$ to the column vectors $Ue_i$ of $U$ which give us an orthogonal basis in $mathbb{R}^5$ such that the first three column vectors span $Im(A)$. You can see this by noting that
$USigma f_1 = Usigma_1 e_1 ne 0$
$USigma f_2 = Usigma_2 e_2 ne 0$
$USigma f_3 = Usigma_3 e_3 ne 0$
$USigma f_4 = ... = USigma f_8 = 0$
Since $U$ is orthogonal its inverse is $U^T$.
Similarly $V$ sends the basis vectors $f_i$ to the column vectors $Vf_i$ of $V$ which give us an orthogonal basis in $mathbb{R}^8$ such that the last 5 span $ker(A)$:
$AVf_i = USigma V^TVf_i = USigma f_i ne 0$ for $i$ = 1, 2, 3
$AVf_i = USigma V^TVf_i = USigma f_i = 0$ for $i$ = 4 .. 8
And the inverse of $V$ is $V^T$.
$Sigma$ jams $mathbb{R}^8$ into $mathbb{R}^5$by mapping the one-dimensional spaces spanned by each of $f_1, f_2, f_3$ onto those spanned by $e_1, e_2, e_3$ (scaling them by $sigma_1, sigma_2, sigma_3$ in the process) while squashing those spanned by $f_4..f_8$.
It follows that $A$ jams $mathbb{R}^8$ into $mathbb{R}^5$ by mapping the one-dimensional spaces spanned by each of $Vf_1, Vf_2, Vf_3$ onto those spanned by $Ue_1, Ue_2, Ue_3$ (scaling them by $sigma_1, sigma_2, sigma_3$ in the process) while squashing those spanned by $Vf_4..Vf_8$.
The key here is that restricted to the space spanned by $Vf_1, Vf_2, Vf_3$ A is an isomorphism onto the space spanned by $Ue_1, Ue_2, Ue_3$, and that $VSigma^+U^T$ is the inverse (when restricted to $Ue_1, Ue_2, Ue_3$).
If we have $AX = b$ and $b in Im(A)$ it follows that there exists a unique $x' in <Vf_1, Vf_2, Vf_3>$ s.t. $Ax' = b$. Any other solution $x$ in $mathbb{R}^8$ takes the form $x' + delta$ for $delta in Ker(A)$. Now since we can decompose $mathbb{R}^8$ into $<Vf_1, Vf_2, Vf_3> oplus <Vf_4, .., Vf_8>$ we have $lvert xrvert^2 = <x' + delta, x' + delta> = lvert x'rvert^2 + lvert deltarvert^2$ and so $lvert xrvert >= lvert x'rvert$ - that is, $x'$ is the closest solution to the origin.
Let's fix the dimensions for the sake of making the example simpler and say that $A:mathbb{R}^8tomathbb{R}^5$ and that $rank(A)=3$.
$A=USigma V^T$ is the singular value decomposition of $A$ and $e_1, ..., e_5$ and $f_1, ..., f_8$ are the standard bases of $mathbb{R}^5$ and $mathbb{R}^8$ (respectively), i.e. $e_1 = [1, 0, 0, 0, 0]^T$ etc.
So $A(mathbb{R}^8)$ is a 3-dimensional subspace of $mathbb{R}^5$. What are the (or some) geometric interpretations of $U$, $Sigma$ and $V$?
$U$ sends the basis vectors $e_i$ to the column vectors $Ue_i$ of $U$ which give us an orthogonal basis in $mathbb{R}^5$ such that the first three column vectors span $Im(A)$. You can see this by noting that
$USigma f_1 = Usigma_1 e_1 ne 0$
$USigma f_2 = Usigma_2 e_2 ne 0$
$USigma f_3 = Usigma_3 e_3 ne 0$
$USigma f_4 = ... = USigma f_8 = 0$
Since $U$ is orthogonal its inverse is $U^T$.
Similarly $V$ sends the basis vectors $f_i$ to the column vectors $Vf_i$ of $V$ which give us an orthogonal basis in $mathbb{R}^8$ such that the last 5 span $ker(A)$:
$AVf_i = USigma V^TVf_i = USigma f_i ne 0$ for $i$ = 1, 2, 3
$AVf_i = USigma V^TVf_i = USigma f_i = 0$ for $i$ = 4 .. 8
And the inverse of $V$ is $V^T$.
$Sigma$ jams $mathbb{R}^8$ into $mathbb{R}^5$by mapping the one-dimensional spaces spanned by each of $f_1, f_2, f_3$ onto those spanned by $e_1, e_2, e_3$ (scaling them by $sigma_1, sigma_2, sigma_3$ in the process) while squashing those spanned by $f_4..f_8$.
It follows that $A$ jams $mathbb{R}^8$ into $mathbb{R}^5$ by mapping the one-dimensional spaces spanned by each of $Vf_1, Vf_2, Vf_3$ onto those spanned by $Ue_1, Ue_2, Ue_3$ (scaling them by $sigma_1, sigma_2, sigma_3$ in the process) while squashing those spanned by $Vf_4..Vf_8$.
The key here is that restricted to the space spanned by $Vf_1, Vf_2, Vf_3$ A is an isomorphism onto the space spanned by $Ue_1, Ue_2, Ue_3$, and that $VSigma^+U^T$ is the inverse (when restricted to $Ue_1, Ue_2, Ue_3$).
If we have $AX = b$ and $b in Im(A)$ it follows that there exists a unique $x' in <Vf_1, Vf_2, Vf_3>$ s.t. $Ax' = b$. Any other solution $x$ in $mathbb{R}^8$ takes the form $x' + delta$ for $delta in Ker(A)$. Now since we can decompose $mathbb{R}^8$ into $<Vf_1, Vf_2, Vf_3> oplus <Vf_4, .., Vf_8>$ we have $lvert xrvert^2 = <x' + delta, x' + delta> = lvert x'rvert^2 + lvert deltarvert^2$ and so $lvert xrvert >= lvert x'rvert$ - that is, $x'$ is the closest solution to the origin.
answered Sep 30 '16 at 13:37
Amos Joshua
251110
251110
add a comment |
add a comment |
The resource linked below really helped me understand this. The transformation $A$ can be interpreted in 2D as mapping the unit circle to an elipse. This can be done in a 3 step process using the SVD:
- Rotate the unit circle so it can be stretched along its axis
- Stretch each axis to form the ellipse
- Rotate again to align the ellipse with the output space of $A$
To solve for $x$, you reverse this process, starting with $b$.
Least squares comes in when step 2 creates a ellipse with a width of zero. When you're going through this process in reverse, when you get to step 2, un-stretching throws away that dimension with a width of zero. Still, you're left with the closest point to $b$ in the output space of $A$. Continuing through the reversed process gets you to $x'$.
In other words, the transformation $A$ maps the unit circle to a line instead of an ellipse, and you've found the $x$ for which $Ax$ results in the closest point on that line to point $b$.
https://www.cs.cornell.edu/courses/cs3220/2010sp/notes/svd.pdf
add a comment |
The resource linked below really helped me understand this. The transformation $A$ can be interpreted in 2D as mapping the unit circle to an elipse. This can be done in a 3 step process using the SVD:
- Rotate the unit circle so it can be stretched along its axis
- Stretch each axis to form the ellipse
- Rotate again to align the ellipse with the output space of $A$
To solve for $x$, you reverse this process, starting with $b$.
Least squares comes in when step 2 creates a ellipse with a width of zero. When you're going through this process in reverse, when you get to step 2, un-stretching throws away that dimension with a width of zero. Still, you're left with the closest point to $b$ in the output space of $A$. Continuing through the reversed process gets you to $x'$.
In other words, the transformation $A$ maps the unit circle to a line instead of an ellipse, and you've found the $x$ for which $Ax$ results in the closest point on that line to point $b$.
https://www.cs.cornell.edu/courses/cs3220/2010sp/notes/svd.pdf
add a comment |
The resource linked below really helped me understand this. The transformation $A$ can be interpreted in 2D as mapping the unit circle to an elipse. This can be done in a 3 step process using the SVD:
- Rotate the unit circle so it can be stretched along its axis
- Stretch each axis to form the ellipse
- Rotate again to align the ellipse with the output space of $A$
To solve for $x$, you reverse this process, starting with $b$.
Least squares comes in when step 2 creates a ellipse with a width of zero. When you're going through this process in reverse, when you get to step 2, un-stretching throws away that dimension with a width of zero. Still, you're left with the closest point to $b$ in the output space of $A$. Continuing through the reversed process gets you to $x'$.
In other words, the transformation $A$ maps the unit circle to a line instead of an ellipse, and you've found the $x$ for which $Ax$ results in the closest point on that line to point $b$.
https://www.cs.cornell.edu/courses/cs3220/2010sp/notes/svd.pdf
The resource linked below really helped me understand this. The transformation $A$ can be interpreted in 2D as mapping the unit circle to an elipse. This can be done in a 3 step process using the SVD:
- Rotate the unit circle so it can be stretched along its axis
- Stretch each axis to form the ellipse
- Rotate again to align the ellipse with the output space of $A$
To solve for $x$, you reverse this process, starting with $b$.
Least squares comes in when step 2 creates a ellipse with a width of zero. When you're going through this process in reverse, when you get to step 2, un-stretching throws away that dimension with a width of zero. Still, you're left with the closest point to $b$ in the output space of $A$. Continuing through the reversed process gets you to $x'$.
In other words, the transformation $A$ maps the unit circle to a line instead of an ellipse, and you've found the $x$ for which $Ax$ results in the closest point on that line to point $b$.
https://www.cs.cornell.edu/courses/cs3220/2010sp/notes/svd.pdf
answered Feb 25 '17 at 16:19
Chris Noyes
111
111
add a comment |
add a comment |
Thanks for contributing an answer to Mathematics Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f974193%2fwhy-does-svd-provide-the-least-squares-and-least-norm-solution-to-a-x-b%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Proof and deep analysis can be found in my Singular Value Decomposition (SVD) Presentation. Specifically this issue is on pages 50-60.
– Royi
Jun 16 '18 at 18:11