Why Does SVD Provide the Least Squares and Least Norm Solution to $ A x = b $?












16














I am studying the Singular Value Decomposition and its properties. It is widely used in order to solve equations of the form $Ax=b$. I have seen the following: When we have the equation system $Ax=b$, we calculate the SVD of A as $A=USigma V^T$. Then we calculate $x'= V Sigma^{+}U^Tb$. $Sigma^{+}$ has the reciprocals ($dfrac{1}{sigma_i}$) of the singular values in its diagonal and zeros where $sigma_i=0$. If the $b$ is in the range of $A$ then it is the solution which has the minimum norm (closest to origin). If it is not in the range, then it is the least squares solution.



I fail to see how exactly this procedure always produce a $x'$ which is closest to origin if $b$ is in the range of A. (I can see the least squares solution is an extension of this "closest to origin" property). From a geometric intuitive way if possible, how can we show this property of SVD?










share|cite|improve this question
























  • Proof and deep analysis can be found in my Singular Value Decomposition (SVD) Presentation. Specifically this issue is on pages 50-60.
    – Royi
    Jun 16 '18 at 18:11
















16














I am studying the Singular Value Decomposition and its properties. It is widely used in order to solve equations of the form $Ax=b$. I have seen the following: When we have the equation system $Ax=b$, we calculate the SVD of A as $A=USigma V^T$. Then we calculate $x'= V Sigma^{+}U^Tb$. $Sigma^{+}$ has the reciprocals ($dfrac{1}{sigma_i}$) of the singular values in its diagonal and zeros where $sigma_i=0$. If the $b$ is in the range of $A$ then it is the solution which has the minimum norm (closest to origin). If it is not in the range, then it is the least squares solution.



I fail to see how exactly this procedure always produce a $x'$ which is closest to origin if $b$ is in the range of A. (I can see the least squares solution is an extension of this "closest to origin" property). From a geometric intuitive way if possible, how can we show this property of SVD?










share|cite|improve this question
























  • Proof and deep analysis can be found in my Singular Value Decomposition (SVD) Presentation. Specifically this issue is on pages 50-60.
    – Royi
    Jun 16 '18 at 18:11














16












16








16


13





I am studying the Singular Value Decomposition and its properties. It is widely used in order to solve equations of the form $Ax=b$. I have seen the following: When we have the equation system $Ax=b$, we calculate the SVD of A as $A=USigma V^T$. Then we calculate $x'= V Sigma^{+}U^Tb$. $Sigma^{+}$ has the reciprocals ($dfrac{1}{sigma_i}$) of the singular values in its diagonal and zeros where $sigma_i=0$. If the $b$ is in the range of $A$ then it is the solution which has the minimum norm (closest to origin). If it is not in the range, then it is the least squares solution.



I fail to see how exactly this procedure always produce a $x'$ which is closest to origin if $b$ is in the range of A. (I can see the least squares solution is an extension of this "closest to origin" property). From a geometric intuitive way if possible, how can we show this property of SVD?










share|cite|improve this question















I am studying the Singular Value Decomposition and its properties. It is widely used in order to solve equations of the form $Ax=b$. I have seen the following: When we have the equation system $Ax=b$, we calculate the SVD of A as $A=USigma V^T$. Then we calculate $x'= V Sigma^{+}U^Tb$. $Sigma^{+}$ has the reciprocals ($dfrac{1}{sigma_i}$) of the singular values in its diagonal and zeros where $sigma_i=0$. If the $b$ is in the range of $A$ then it is the solution which has the minimum norm (closest to origin). If it is not in the range, then it is the least squares solution.



I fail to see how exactly this procedure always produce a $x'$ which is closest to origin if $b$ is in the range of A. (I can see the least squares solution is an extension of this "closest to origin" property). From a geometric intuitive way if possible, how can we show this property of SVD?







linear-algebra optimization svd least-squares






share|cite|improve this question















share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited Aug 21 '17 at 21:05









Royi

3,31512351




3,31512351










asked Oct 15 '14 at 0:19









Ufuk Can Bicici

1,21811027




1,21811027












  • Proof and deep analysis can be found in my Singular Value Decomposition (SVD) Presentation. Specifically this issue is on pages 50-60.
    – Royi
    Jun 16 '18 at 18:11


















  • Proof and deep analysis can be found in my Singular Value Decomposition (SVD) Presentation. Specifically this issue is on pages 50-60.
    – Royi
    Jun 16 '18 at 18:11
















Proof and deep analysis can be found in my Singular Value Decomposition (SVD) Presentation. Specifically this issue is on pages 50-60.
– Royi
Jun 16 '18 at 18:11




Proof and deep analysis can be found in my Singular Value Decomposition (SVD) Presentation. Specifically this issue is on pages 50-60.
– Royi
Jun 16 '18 at 18:11










4 Answers
4






active

oldest

votes


















10














First, consider the problem $Sigma x = b$, where
$$
Sigma = pmatrix{sigma_1\& ddots\&&sigma_r\ &&&0\&&&&ddots\&&&&&0}
$$
Note that $b$ is only in the range of $Sigma$ if its entries $b_{r+1},dots,b_n$ are all zero. Furthermore, you should be able to convince yourself (geometrically or otherwise) that the least squares solution must be
$$
x = (b_1/sigma_1,dots,b_r/sigma_r,0,dots,0)^T = Sigma^+ b
$$
From there, note that
$$
USigma V^T x = b implies\
Sigma (V^T x ) = U^T b
$$
By the above argument, the least squares solution for $(V^T x)$ is given by
$V^T x = Sigma^+ U^T b$. Noting that $|V^T x| = |x|$, we can use this to conclude that $x = (V Sigma ^+ U^T)b$ must be the least squares solution (for $x$).



I hope you find this explanation sufficient.






share|cite|improve this answer























  • A question: In the beginning did we assumed that $Sigma $ is a nxn square matrix? What is the reason that the entries from $ n+1$ to $ n$ should be zero, such that $ b $ is in the range of $Sigma $?
    – Ufuk Can Bicici
    Oct 15 '14 at 5:27










  • Wait one minute; you mean $ b $ for being its entries from $ r+1$ to $ n $ as zero, right?
    – Ufuk Can Bicici
    Oct 15 '14 at 5:40












  • For the moment, I assume that $Sigma$ is a nxn square, diagonal matrix with rank $r$, which means it has $r$ nonzero entries in it diagonals. (Correct me if I am wrong, I am not in the best terms with linear algebra :) ) Then $Sigma x = b$ has only a singe unique solution, which is, as you have written $x = (b_1/sigma_1,dots,b_r/sigma_r,0,dots,0)^T$. Now since we have only a single solution, how can we talk about a least squarest solution as well? Unfortunately I became heavily confused .
    – Ufuk Can Bicici
    Oct 15 '14 at 9:51








  • 1




    For convenience, I was indeed talking about square matrices. If $Sigma$ has rank $r<n$ and $b$ is in the range of $Sigma,$ then of course $Sigma x = b$ has infinitely many solutions. What do those solutions look like? Why is the one I presented the least squares solution?
    – Omnomnomnom
    Oct 15 '14 at 11:18






  • 1




    What if $x_{r+1},dots,x_{n}$ are non-zero?
    – Omnomnomnom
    Oct 15 '14 at 11:48



















3














The pseudoinverse solution from the SVD is derived in proving standard least square problem with SVD.



Given $mathbf{A}x=b$, where the data vector $bnotinmathcal{N}left( mathbf{A}^{*} right)$, the least squares solution exists and is given by
$$
x_{LS} = color{blue}{mathbf{A}^{dagger}b} + color{red}{left( mathbf{I}_{n} - mathbf{A}^{dagger}mathbf{A}right) y}, quad yinmathbb{C}^{n}
$$

where blue vectors are in the range space $color{blue}{mathcal{R}left( mathbf{A}^{*} right)}$ and red vectors are in the null space $color{red}{mathcal{N}left( mathbf{A} right)}.$
The least squares solution $r^{2}$ minimizes the sum of the squares of the residual errors and is an affine space. That is
$$
lVert
mathbf{A} x_{LS} (y)
rVert_{2}^{2} = r^{2}_{min}
$$

for all values of $y$.



What is the vector in this affine space with the smallest length? The length of the solution vectors is
$$
lVert
x_{LS}
rVert_{2}^{2}
=
lVert
color{blue}{mathbf{A}^{dagger}b} +
color{red}{left( mathbf{I}_{n} - mathbf{A}^{dagger}mathbf{A}right) y}
rVert_{2}^{2}
=
lVert
color{blue}{mathbf{A}^{dagger}b}
rVert_{2}^{2}
+
underbrace{lVert
color{red}{left( mathbf{I}_{n} - mathbf{A}^{dagger}mathbf{A}right) y}
rVert_{2}^{2}}_{y=mathbf{0}}
$$

The solution vector of minimum length is $color{blue}{mathbf{A}^{dagger}b}$, the point in the affine space closest to the origin.






share|cite|improve this answer























  • are you using $A^*$ or $A^dagger$ to denote the Hermitian conjugate?
    – glS
    Nov 16 '18 at 13:20










  • @glS. The asterisk denotes the Hermitian conjugate; the dagger the Moore-Penrose pseudoinverse. Thanks to your question notational errors have been corrected.
    – dantopa
    Nov 19 '18 at 23:40



















1














Let's fix the dimensions for the sake of making the example simpler and say that $A:mathbb{R}^8tomathbb{R}^5$ and that $rank(A)=3$.



$A=USigma V^T$ is the singular value decomposition of $A$ and $e_1, ..., e_5$ and $f_1, ..., f_8$ are the standard bases of $mathbb{R}^5$ and $mathbb{R}^8$ (respectively), i.e. $e_1 = [1, 0, 0, 0, 0]^T$ etc.



So $A(mathbb{R}^8)$ is a 3-dimensional subspace of $mathbb{R}^5$. What are the (or some) geometric interpretations of $U$, $Sigma$ and $V$?



$U$ sends the basis vectors $e_i$ to the column vectors $Ue_i$ of $U$ which give us an orthogonal basis in $mathbb{R}^5$ such that the first three column vectors span $Im(A)$. You can see this by noting that



$USigma f_1 = Usigma_1 e_1 ne 0$



$USigma f_2 = Usigma_2 e_2 ne 0$



$USigma f_3 = Usigma_3 e_3 ne 0$



$USigma f_4 = ... = USigma f_8 = 0$



Since $U$ is orthogonal its inverse is $U^T$.



Similarly $V$ sends the basis vectors $f_i$ to the column vectors $Vf_i$ of $V$ which give us an orthogonal basis in $mathbb{R}^8$ such that the last 5 span $ker(A)$:



$AVf_i = USigma V^TVf_i = USigma f_i ne 0$ for $i$ = 1, 2, 3



$AVf_i = USigma V^TVf_i = USigma f_i = 0$ for $i$ = 4 .. 8



And the inverse of $V$ is $V^T$.



$Sigma$ jams $mathbb{R}^8$ into $mathbb{R}^5$by mapping the one-dimensional spaces spanned by each of $f_1, f_2, f_3$ onto those spanned by $e_1, e_2, e_3$ (scaling them by $sigma_1, sigma_2, sigma_3$ in the process) while squashing those spanned by $f_4..f_8$.



It follows that $A$ jams $mathbb{R}^8$ into $mathbb{R}^5$ by mapping the one-dimensional spaces spanned by each of $Vf_1, Vf_2, Vf_3$ onto those spanned by $Ue_1, Ue_2, Ue_3$ (scaling them by $sigma_1, sigma_2, sigma_3$ in the process) while squashing those spanned by $Vf_4..Vf_8$.



The key here is that restricted to the space spanned by $Vf_1, Vf_2, Vf_3$ A is an isomorphism onto the space spanned by $Ue_1, Ue_2, Ue_3$, and that $VSigma^+U^T$ is the inverse (when restricted to $Ue_1, Ue_2, Ue_3$).



If we have $AX = b$ and $b in Im(A)$ it follows that there exists a unique $x' in <Vf_1, Vf_2, Vf_3>$ s.t. $Ax' = b$. Any other solution $x$ in $mathbb{R}^8$ takes the form $x' + delta$ for $delta in Ker(A)$. Now since we can decompose $mathbb{R}^8$ into $<Vf_1, Vf_2, Vf_3> oplus <Vf_4, .., Vf_8>$ we have $lvert xrvert^2 = <x' + delta, x' + delta> = lvert x'rvert^2 + lvert deltarvert^2$ and so $lvert xrvert >= lvert x'rvert$ - that is, $x'$ is the closest solution to the origin.






share|cite|improve this answer





























    1














    The resource linked below really helped me understand this. The transformation $A$ can be interpreted in 2D as mapping the unit circle to an elipse. This can be done in a 3 step process using the SVD:




    1. Rotate the unit circle so it can be stretched along its axis

    2. Stretch each axis to form the ellipse

    3. Rotate again to align the ellipse with the output space of $A$


    To solve for $x$, you reverse this process, starting with $b$.



    Least squares comes in when step 2 creates a ellipse with a width of zero. When you're going through this process in reverse, when you get to step 2, un-stretching throws away that dimension with a width of zero. Still, you're left with the closest point to $b$ in the output space of $A$. Continuing through the reversed process gets you to $x'$.



    In other words, the transformation $A$ maps the unit circle to a line instead of an ellipse, and you've found the $x$ for which $Ax$ results in the closest point on that line to point $b$.



    https://www.cs.cornell.edu/courses/cs3220/2010sp/notes/svd.pdf






    share|cite|improve this answer





















      Your Answer





      StackExchange.ifUsing("editor", function () {
      return StackExchange.using("mathjaxEditing", function () {
      StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
      StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
      });
      });
      }, "mathjax-editing");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "69"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      noCode: true, onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f974193%2fwhy-does-svd-provide-the-least-squares-and-least-norm-solution-to-a-x-b%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      4 Answers
      4






      active

      oldest

      votes








      4 Answers
      4






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      10














      First, consider the problem $Sigma x = b$, where
      $$
      Sigma = pmatrix{sigma_1\& ddots\&&sigma_r\ &&&0\&&&&ddots\&&&&&0}
      $$
      Note that $b$ is only in the range of $Sigma$ if its entries $b_{r+1},dots,b_n$ are all zero. Furthermore, you should be able to convince yourself (geometrically or otherwise) that the least squares solution must be
      $$
      x = (b_1/sigma_1,dots,b_r/sigma_r,0,dots,0)^T = Sigma^+ b
      $$
      From there, note that
      $$
      USigma V^T x = b implies\
      Sigma (V^T x ) = U^T b
      $$
      By the above argument, the least squares solution for $(V^T x)$ is given by
      $V^T x = Sigma^+ U^T b$. Noting that $|V^T x| = |x|$, we can use this to conclude that $x = (V Sigma ^+ U^T)b$ must be the least squares solution (for $x$).



      I hope you find this explanation sufficient.






      share|cite|improve this answer























      • A question: In the beginning did we assumed that $Sigma $ is a nxn square matrix? What is the reason that the entries from $ n+1$ to $ n$ should be zero, such that $ b $ is in the range of $Sigma $?
        – Ufuk Can Bicici
        Oct 15 '14 at 5:27










      • Wait one minute; you mean $ b $ for being its entries from $ r+1$ to $ n $ as zero, right?
        – Ufuk Can Bicici
        Oct 15 '14 at 5:40












      • For the moment, I assume that $Sigma$ is a nxn square, diagonal matrix with rank $r$, which means it has $r$ nonzero entries in it diagonals. (Correct me if I am wrong, I am not in the best terms with linear algebra :) ) Then $Sigma x = b$ has only a singe unique solution, which is, as you have written $x = (b_1/sigma_1,dots,b_r/sigma_r,0,dots,0)^T$. Now since we have only a single solution, how can we talk about a least squarest solution as well? Unfortunately I became heavily confused .
        – Ufuk Can Bicici
        Oct 15 '14 at 9:51








      • 1




        For convenience, I was indeed talking about square matrices. If $Sigma$ has rank $r<n$ and $b$ is in the range of $Sigma,$ then of course $Sigma x = b$ has infinitely many solutions. What do those solutions look like? Why is the one I presented the least squares solution?
        – Omnomnomnom
        Oct 15 '14 at 11:18






      • 1




        What if $x_{r+1},dots,x_{n}$ are non-zero?
        – Omnomnomnom
        Oct 15 '14 at 11:48
















      10














      First, consider the problem $Sigma x = b$, where
      $$
      Sigma = pmatrix{sigma_1\& ddots\&&sigma_r\ &&&0\&&&&ddots\&&&&&0}
      $$
      Note that $b$ is only in the range of $Sigma$ if its entries $b_{r+1},dots,b_n$ are all zero. Furthermore, you should be able to convince yourself (geometrically or otherwise) that the least squares solution must be
      $$
      x = (b_1/sigma_1,dots,b_r/sigma_r,0,dots,0)^T = Sigma^+ b
      $$
      From there, note that
      $$
      USigma V^T x = b implies\
      Sigma (V^T x ) = U^T b
      $$
      By the above argument, the least squares solution for $(V^T x)$ is given by
      $V^T x = Sigma^+ U^T b$. Noting that $|V^T x| = |x|$, we can use this to conclude that $x = (V Sigma ^+ U^T)b$ must be the least squares solution (for $x$).



      I hope you find this explanation sufficient.






      share|cite|improve this answer























      • A question: In the beginning did we assumed that $Sigma $ is a nxn square matrix? What is the reason that the entries from $ n+1$ to $ n$ should be zero, such that $ b $ is in the range of $Sigma $?
        – Ufuk Can Bicici
        Oct 15 '14 at 5:27










      • Wait one minute; you mean $ b $ for being its entries from $ r+1$ to $ n $ as zero, right?
        – Ufuk Can Bicici
        Oct 15 '14 at 5:40












      • For the moment, I assume that $Sigma$ is a nxn square, diagonal matrix with rank $r$, which means it has $r$ nonzero entries in it diagonals. (Correct me if I am wrong, I am not in the best terms with linear algebra :) ) Then $Sigma x = b$ has only a singe unique solution, which is, as you have written $x = (b_1/sigma_1,dots,b_r/sigma_r,0,dots,0)^T$. Now since we have only a single solution, how can we talk about a least squarest solution as well? Unfortunately I became heavily confused .
        – Ufuk Can Bicici
        Oct 15 '14 at 9:51








      • 1




        For convenience, I was indeed talking about square matrices. If $Sigma$ has rank $r<n$ and $b$ is in the range of $Sigma,$ then of course $Sigma x = b$ has infinitely many solutions. What do those solutions look like? Why is the one I presented the least squares solution?
        – Omnomnomnom
        Oct 15 '14 at 11:18






      • 1




        What if $x_{r+1},dots,x_{n}$ are non-zero?
        – Omnomnomnom
        Oct 15 '14 at 11:48














      10












      10








      10






      First, consider the problem $Sigma x = b$, where
      $$
      Sigma = pmatrix{sigma_1\& ddots\&&sigma_r\ &&&0\&&&&ddots\&&&&&0}
      $$
      Note that $b$ is only in the range of $Sigma$ if its entries $b_{r+1},dots,b_n$ are all zero. Furthermore, you should be able to convince yourself (geometrically or otherwise) that the least squares solution must be
      $$
      x = (b_1/sigma_1,dots,b_r/sigma_r,0,dots,0)^T = Sigma^+ b
      $$
      From there, note that
      $$
      USigma V^T x = b implies\
      Sigma (V^T x ) = U^T b
      $$
      By the above argument, the least squares solution for $(V^T x)$ is given by
      $V^T x = Sigma^+ U^T b$. Noting that $|V^T x| = |x|$, we can use this to conclude that $x = (V Sigma ^+ U^T)b$ must be the least squares solution (for $x$).



      I hope you find this explanation sufficient.






      share|cite|improve this answer














      First, consider the problem $Sigma x = b$, where
      $$
      Sigma = pmatrix{sigma_1\& ddots\&&sigma_r\ &&&0\&&&&ddots\&&&&&0}
      $$
      Note that $b$ is only in the range of $Sigma$ if its entries $b_{r+1},dots,b_n$ are all zero. Furthermore, you should be able to convince yourself (geometrically or otherwise) that the least squares solution must be
      $$
      x = (b_1/sigma_1,dots,b_r/sigma_r,0,dots,0)^T = Sigma^+ b
      $$
      From there, note that
      $$
      USigma V^T x = b implies\
      Sigma (V^T x ) = U^T b
      $$
      By the above argument, the least squares solution for $(V^T x)$ is given by
      $V^T x = Sigma^+ U^T b$. Noting that $|V^T x| = |x|$, we can use this to conclude that $x = (V Sigma ^+ U^T)b$ must be the least squares solution (for $x$).



      I hope you find this explanation sufficient.







      share|cite|improve this answer














      share|cite|improve this answer



      share|cite|improve this answer








      edited Oct 15 '14 at 11:19

























      answered Oct 15 '14 at 3:13









      Omnomnomnom

      126k788176




      126k788176












      • A question: In the beginning did we assumed that $Sigma $ is a nxn square matrix? What is the reason that the entries from $ n+1$ to $ n$ should be zero, such that $ b $ is in the range of $Sigma $?
        – Ufuk Can Bicici
        Oct 15 '14 at 5:27










      • Wait one minute; you mean $ b $ for being its entries from $ r+1$ to $ n $ as zero, right?
        – Ufuk Can Bicici
        Oct 15 '14 at 5:40












      • For the moment, I assume that $Sigma$ is a nxn square, diagonal matrix with rank $r$, which means it has $r$ nonzero entries in it diagonals. (Correct me if I am wrong, I am not in the best terms with linear algebra :) ) Then $Sigma x = b$ has only a singe unique solution, which is, as you have written $x = (b_1/sigma_1,dots,b_r/sigma_r,0,dots,0)^T$. Now since we have only a single solution, how can we talk about a least squarest solution as well? Unfortunately I became heavily confused .
        – Ufuk Can Bicici
        Oct 15 '14 at 9:51








      • 1




        For convenience, I was indeed talking about square matrices. If $Sigma$ has rank $r<n$ and $b$ is in the range of $Sigma,$ then of course $Sigma x = b$ has infinitely many solutions. What do those solutions look like? Why is the one I presented the least squares solution?
        – Omnomnomnom
        Oct 15 '14 at 11:18






      • 1




        What if $x_{r+1},dots,x_{n}$ are non-zero?
        – Omnomnomnom
        Oct 15 '14 at 11:48


















      • A question: In the beginning did we assumed that $Sigma $ is a nxn square matrix? What is the reason that the entries from $ n+1$ to $ n$ should be zero, such that $ b $ is in the range of $Sigma $?
        – Ufuk Can Bicici
        Oct 15 '14 at 5:27










      • Wait one minute; you mean $ b $ for being its entries from $ r+1$ to $ n $ as zero, right?
        – Ufuk Can Bicici
        Oct 15 '14 at 5:40












      • For the moment, I assume that $Sigma$ is a nxn square, diagonal matrix with rank $r$, which means it has $r$ nonzero entries in it diagonals. (Correct me if I am wrong, I am not in the best terms with linear algebra :) ) Then $Sigma x = b$ has only a singe unique solution, which is, as you have written $x = (b_1/sigma_1,dots,b_r/sigma_r,0,dots,0)^T$. Now since we have only a single solution, how can we talk about a least squarest solution as well? Unfortunately I became heavily confused .
        – Ufuk Can Bicici
        Oct 15 '14 at 9:51








      • 1




        For convenience, I was indeed talking about square matrices. If $Sigma$ has rank $r<n$ and $b$ is in the range of $Sigma,$ then of course $Sigma x = b$ has infinitely many solutions. What do those solutions look like? Why is the one I presented the least squares solution?
        – Omnomnomnom
        Oct 15 '14 at 11:18






      • 1




        What if $x_{r+1},dots,x_{n}$ are non-zero?
        – Omnomnomnom
        Oct 15 '14 at 11:48
















      A question: In the beginning did we assumed that $Sigma $ is a nxn square matrix? What is the reason that the entries from $ n+1$ to $ n$ should be zero, such that $ b $ is in the range of $Sigma $?
      – Ufuk Can Bicici
      Oct 15 '14 at 5:27




      A question: In the beginning did we assumed that $Sigma $ is a nxn square matrix? What is the reason that the entries from $ n+1$ to $ n$ should be zero, such that $ b $ is in the range of $Sigma $?
      – Ufuk Can Bicici
      Oct 15 '14 at 5:27












      Wait one minute; you mean $ b $ for being its entries from $ r+1$ to $ n $ as zero, right?
      – Ufuk Can Bicici
      Oct 15 '14 at 5:40






      Wait one minute; you mean $ b $ for being its entries from $ r+1$ to $ n $ as zero, right?
      – Ufuk Can Bicici
      Oct 15 '14 at 5:40














      For the moment, I assume that $Sigma$ is a nxn square, diagonal matrix with rank $r$, which means it has $r$ nonzero entries in it diagonals. (Correct me if I am wrong, I am not in the best terms with linear algebra :) ) Then $Sigma x = b$ has only a singe unique solution, which is, as you have written $x = (b_1/sigma_1,dots,b_r/sigma_r,0,dots,0)^T$. Now since we have only a single solution, how can we talk about a least squarest solution as well? Unfortunately I became heavily confused .
      – Ufuk Can Bicici
      Oct 15 '14 at 9:51






      For the moment, I assume that $Sigma$ is a nxn square, diagonal matrix with rank $r$, which means it has $r$ nonzero entries in it diagonals. (Correct me if I am wrong, I am not in the best terms with linear algebra :) ) Then $Sigma x = b$ has only a singe unique solution, which is, as you have written $x = (b_1/sigma_1,dots,b_r/sigma_r,0,dots,0)^T$. Now since we have only a single solution, how can we talk about a least squarest solution as well? Unfortunately I became heavily confused .
      – Ufuk Can Bicici
      Oct 15 '14 at 9:51






      1




      1




      For convenience, I was indeed talking about square matrices. If $Sigma$ has rank $r<n$ and $b$ is in the range of $Sigma,$ then of course $Sigma x = b$ has infinitely many solutions. What do those solutions look like? Why is the one I presented the least squares solution?
      – Omnomnomnom
      Oct 15 '14 at 11:18




      For convenience, I was indeed talking about square matrices. If $Sigma$ has rank $r<n$ and $b$ is in the range of $Sigma,$ then of course $Sigma x = b$ has infinitely many solutions. What do those solutions look like? Why is the one I presented the least squares solution?
      – Omnomnomnom
      Oct 15 '14 at 11:18




      1




      1




      What if $x_{r+1},dots,x_{n}$ are non-zero?
      – Omnomnomnom
      Oct 15 '14 at 11:48




      What if $x_{r+1},dots,x_{n}$ are non-zero?
      – Omnomnomnom
      Oct 15 '14 at 11:48











      3














      The pseudoinverse solution from the SVD is derived in proving standard least square problem with SVD.



      Given $mathbf{A}x=b$, where the data vector $bnotinmathcal{N}left( mathbf{A}^{*} right)$, the least squares solution exists and is given by
      $$
      x_{LS} = color{blue}{mathbf{A}^{dagger}b} + color{red}{left( mathbf{I}_{n} - mathbf{A}^{dagger}mathbf{A}right) y}, quad yinmathbb{C}^{n}
      $$

      where blue vectors are in the range space $color{blue}{mathcal{R}left( mathbf{A}^{*} right)}$ and red vectors are in the null space $color{red}{mathcal{N}left( mathbf{A} right)}.$
      The least squares solution $r^{2}$ minimizes the sum of the squares of the residual errors and is an affine space. That is
      $$
      lVert
      mathbf{A} x_{LS} (y)
      rVert_{2}^{2} = r^{2}_{min}
      $$

      for all values of $y$.



      What is the vector in this affine space with the smallest length? The length of the solution vectors is
      $$
      lVert
      x_{LS}
      rVert_{2}^{2}
      =
      lVert
      color{blue}{mathbf{A}^{dagger}b} +
      color{red}{left( mathbf{I}_{n} - mathbf{A}^{dagger}mathbf{A}right) y}
      rVert_{2}^{2}
      =
      lVert
      color{blue}{mathbf{A}^{dagger}b}
      rVert_{2}^{2}
      +
      underbrace{lVert
      color{red}{left( mathbf{I}_{n} - mathbf{A}^{dagger}mathbf{A}right) y}
      rVert_{2}^{2}}_{y=mathbf{0}}
      $$

      The solution vector of minimum length is $color{blue}{mathbf{A}^{dagger}b}$, the point in the affine space closest to the origin.






      share|cite|improve this answer























      • are you using $A^*$ or $A^dagger$ to denote the Hermitian conjugate?
        – glS
        Nov 16 '18 at 13:20










      • @glS. The asterisk denotes the Hermitian conjugate; the dagger the Moore-Penrose pseudoinverse. Thanks to your question notational errors have been corrected.
        – dantopa
        Nov 19 '18 at 23:40
















      3














      The pseudoinverse solution from the SVD is derived in proving standard least square problem with SVD.



      Given $mathbf{A}x=b$, where the data vector $bnotinmathcal{N}left( mathbf{A}^{*} right)$, the least squares solution exists and is given by
      $$
      x_{LS} = color{blue}{mathbf{A}^{dagger}b} + color{red}{left( mathbf{I}_{n} - mathbf{A}^{dagger}mathbf{A}right) y}, quad yinmathbb{C}^{n}
      $$

      where blue vectors are in the range space $color{blue}{mathcal{R}left( mathbf{A}^{*} right)}$ and red vectors are in the null space $color{red}{mathcal{N}left( mathbf{A} right)}.$
      The least squares solution $r^{2}$ minimizes the sum of the squares of the residual errors and is an affine space. That is
      $$
      lVert
      mathbf{A} x_{LS} (y)
      rVert_{2}^{2} = r^{2}_{min}
      $$

      for all values of $y$.



      What is the vector in this affine space with the smallest length? The length of the solution vectors is
      $$
      lVert
      x_{LS}
      rVert_{2}^{2}
      =
      lVert
      color{blue}{mathbf{A}^{dagger}b} +
      color{red}{left( mathbf{I}_{n} - mathbf{A}^{dagger}mathbf{A}right) y}
      rVert_{2}^{2}
      =
      lVert
      color{blue}{mathbf{A}^{dagger}b}
      rVert_{2}^{2}
      +
      underbrace{lVert
      color{red}{left( mathbf{I}_{n} - mathbf{A}^{dagger}mathbf{A}right) y}
      rVert_{2}^{2}}_{y=mathbf{0}}
      $$

      The solution vector of minimum length is $color{blue}{mathbf{A}^{dagger}b}$, the point in the affine space closest to the origin.






      share|cite|improve this answer























      • are you using $A^*$ or $A^dagger$ to denote the Hermitian conjugate?
        – glS
        Nov 16 '18 at 13:20










      • @glS. The asterisk denotes the Hermitian conjugate; the dagger the Moore-Penrose pseudoinverse. Thanks to your question notational errors have been corrected.
        – dantopa
        Nov 19 '18 at 23:40














      3












      3








      3






      The pseudoinverse solution from the SVD is derived in proving standard least square problem with SVD.



      Given $mathbf{A}x=b$, where the data vector $bnotinmathcal{N}left( mathbf{A}^{*} right)$, the least squares solution exists and is given by
      $$
      x_{LS} = color{blue}{mathbf{A}^{dagger}b} + color{red}{left( mathbf{I}_{n} - mathbf{A}^{dagger}mathbf{A}right) y}, quad yinmathbb{C}^{n}
      $$

      where blue vectors are in the range space $color{blue}{mathcal{R}left( mathbf{A}^{*} right)}$ and red vectors are in the null space $color{red}{mathcal{N}left( mathbf{A} right)}.$
      The least squares solution $r^{2}$ minimizes the sum of the squares of the residual errors and is an affine space. That is
      $$
      lVert
      mathbf{A} x_{LS} (y)
      rVert_{2}^{2} = r^{2}_{min}
      $$

      for all values of $y$.



      What is the vector in this affine space with the smallest length? The length of the solution vectors is
      $$
      lVert
      x_{LS}
      rVert_{2}^{2}
      =
      lVert
      color{blue}{mathbf{A}^{dagger}b} +
      color{red}{left( mathbf{I}_{n} - mathbf{A}^{dagger}mathbf{A}right) y}
      rVert_{2}^{2}
      =
      lVert
      color{blue}{mathbf{A}^{dagger}b}
      rVert_{2}^{2}
      +
      underbrace{lVert
      color{red}{left( mathbf{I}_{n} - mathbf{A}^{dagger}mathbf{A}right) y}
      rVert_{2}^{2}}_{y=mathbf{0}}
      $$

      The solution vector of minimum length is $color{blue}{mathbf{A}^{dagger}b}$, the point in the affine space closest to the origin.






      share|cite|improve this answer














      The pseudoinverse solution from the SVD is derived in proving standard least square problem with SVD.



      Given $mathbf{A}x=b$, where the data vector $bnotinmathcal{N}left( mathbf{A}^{*} right)$, the least squares solution exists and is given by
      $$
      x_{LS} = color{blue}{mathbf{A}^{dagger}b} + color{red}{left( mathbf{I}_{n} - mathbf{A}^{dagger}mathbf{A}right) y}, quad yinmathbb{C}^{n}
      $$

      where blue vectors are in the range space $color{blue}{mathcal{R}left( mathbf{A}^{*} right)}$ and red vectors are in the null space $color{red}{mathcal{N}left( mathbf{A} right)}.$
      The least squares solution $r^{2}$ minimizes the sum of the squares of the residual errors and is an affine space. That is
      $$
      lVert
      mathbf{A} x_{LS} (y)
      rVert_{2}^{2} = r^{2}_{min}
      $$

      for all values of $y$.



      What is the vector in this affine space with the smallest length? The length of the solution vectors is
      $$
      lVert
      x_{LS}
      rVert_{2}^{2}
      =
      lVert
      color{blue}{mathbf{A}^{dagger}b} +
      color{red}{left( mathbf{I}_{n} - mathbf{A}^{dagger}mathbf{A}right) y}
      rVert_{2}^{2}
      =
      lVert
      color{blue}{mathbf{A}^{dagger}b}
      rVert_{2}^{2}
      +
      underbrace{lVert
      color{red}{left( mathbf{I}_{n} - mathbf{A}^{dagger}mathbf{A}right) y}
      rVert_{2}^{2}}_{y=mathbf{0}}
      $$

      The solution vector of minimum length is $color{blue}{mathbf{A}^{dagger}b}$, the point in the affine space closest to the origin.







      share|cite|improve this answer














      share|cite|improve this answer



      share|cite|improve this answer








      edited Nov 19 '18 at 23:38

























      answered Mar 8 '17 at 2:01









      dantopa

      6,41932042




      6,41932042












      • are you using $A^*$ or $A^dagger$ to denote the Hermitian conjugate?
        – glS
        Nov 16 '18 at 13:20










      • @glS. The asterisk denotes the Hermitian conjugate; the dagger the Moore-Penrose pseudoinverse. Thanks to your question notational errors have been corrected.
        – dantopa
        Nov 19 '18 at 23:40


















      • are you using $A^*$ or $A^dagger$ to denote the Hermitian conjugate?
        – glS
        Nov 16 '18 at 13:20










      • @glS. The asterisk denotes the Hermitian conjugate; the dagger the Moore-Penrose pseudoinverse. Thanks to your question notational errors have been corrected.
        – dantopa
        Nov 19 '18 at 23:40
















      are you using $A^*$ or $A^dagger$ to denote the Hermitian conjugate?
      – glS
      Nov 16 '18 at 13:20




      are you using $A^*$ or $A^dagger$ to denote the Hermitian conjugate?
      – glS
      Nov 16 '18 at 13:20












      @glS. The asterisk denotes the Hermitian conjugate; the dagger the Moore-Penrose pseudoinverse. Thanks to your question notational errors have been corrected.
      – dantopa
      Nov 19 '18 at 23:40




      @glS. The asterisk denotes the Hermitian conjugate; the dagger the Moore-Penrose pseudoinverse. Thanks to your question notational errors have been corrected.
      – dantopa
      Nov 19 '18 at 23:40











      1














      Let's fix the dimensions for the sake of making the example simpler and say that $A:mathbb{R}^8tomathbb{R}^5$ and that $rank(A)=3$.



      $A=USigma V^T$ is the singular value decomposition of $A$ and $e_1, ..., e_5$ and $f_1, ..., f_8$ are the standard bases of $mathbb{R}^5$ and $mathbb{R}^8$ (respectively), i.e. $e_1 = [1, 0, 0, 0, 0]^T$ etc.



      So $A(mathbb{R}^8)$ is a 3-dimensional subspace of $mathbb{R}^5$. What are the (or some) geometric interpretations of $U$, $Sigma$ and $V$?



      $U$ sends the basis vectors $e_i$ to the column vectors $Ue_i$ of $U$ which give us an orthogonal basis in $mathbb{R}^5$ such that the first three column vectors span $Im(A)$. You can see this by noting that



      $USigma f_1 = Usigma_1 e_1 ne 0$



      $USigma f_2 = Usigma_2 e_2 ne 0$



      $USigma f_3 = Usigma_3 e_3 ne 0$



      $USigma f_4 = ... = USigma f_8 = 0$



      Since $U$ is orthogonal its inverse is $U^T$.



      Similarly $V$ sends the basis vectors $f_i$ to the column vectors $Vf_i$ of $V$ which give us an orthogonal basis in $mathbb{R}^8$ such that the last 5 span $ker(A)$:



      $AVf_i = USigma V^TVf_i = USigma f_i ne 0$ for $i$ = 1, 2, 3



      $AVf_i = USigma V^TVf_i = USigma f_i = 0$ for $i$ = 4 .. 8



      And the inverse of $V$ is $V^T$.



      $Sigma$ jams $mathbb{R}^8$ into $mathbb{R}^5$by mapping the one-dimensional spaces spanned by each of $f_1, f_2, f_3$ onto those spanned by $e_1, e_2, e_3$ (scaling them by $sigma_1, sigma_2, sigma_3$ in the process) while squashing those spanned by $f_4..f_8$.



      It follows that $A$ jams $mathbb{R}^8$ into $mathbb{R}^5$ by mapping the one-dimensional spaces spanned by each of $Vf_1, Vf_2, Vf_3$ onto those spanned by $Ue_1, Ue_2, Ue_3$ (scaling them by $sigma_1, sigma_2, sigma_3$ in the process) while squashing those spanned by $Vf_4..Vf_8$.



      The key here is that restricted to the space spanned by $Vf_1, Vf_2, Vf_3$ A is an isomorphism onto the space spanned by $Ue_1, Ue_2, Ue_3$, and that $VSigma^+U^T$ is the inverse (when restricted to $Ue_1, Ue_2, Ue_3$).



      If we have $AX = b$ and $b in Im(A)$ it follows that there exists a unique $x' in <Vf_1, Vf_2, Vf_3>$ s.t. $Ax' = b$. Any other solution $x$ in $mathbb{R}^8$ takes the form $x' + delta$ for $delta in Ker(A)$. Now since we can decompose $mathbb{R}^8$ into $<Vf_1, Vf_2, Vf_3> oplus <Vf_4, .., Vf_8>$ we have $lvert xrvert^2 = <x' + delta, x' + delta> = lvert x'rvert^2 + lvert deltarvert^2$ and so $lvert xrvert >= lvert x'rvert$ - that is, $x'$ is the closest solution to the origin.






      share|cite|improve this answer


























        1














        Let's fix the dimensions for the sake of making the example simpler and say that $A:mathbb{R}^8tomathbb{R}^5$ and that $rank(A)=3$.



        $A=USigma V^T$ is the singular value decomposition of $A$ and $e_1, ..., e_5$ and $f_1, ..., f_8$ are the standard bases of $mathbb{R}^5$ and $mathbb{R}^8$ (respectively), i.e. $e_1 = [1, 0, 0, 0, 0]^T$ etc.



        So $A(mathbb{R}^8)$ is a 3-dimensional subspace of $mathbb{R}^5$. What are the (or some) geometric interpretations of $U$, $Sigma$ and $V$?



        $U$ sends the basis vectors $e_i$ to the column vectors $Ue_i$ of $U$ which give us an orthogonal basis in $mathbb{R}^5$ such that the first three column vectors span $Im(A)$. You can see this by noting that



        $USigma f_1 = Usigma_1 e_1 ne 0$



        $USigma f_2 = Usigma_2 e_2 ne 0$



        $USigma f_3 = Usigma_3 e_3 ne 0$



        $USigma f_4 = ... = USigma f_8 = 0$



        Since $U$ is orthogonal its inverse is $U^T$.



        Similarly $V$ sends the basis vectors $f_i$ to the column vectors $Vf_i$ of $V$ which give us an orthogonal basis in $mathbb{R}^8$ such that the last 5 span $ker(A)$:



        $AVf_i = USigma V^TVf_i = USigma f_i ne 0$ for $i$ = 1, 2, 3



        $AVf_i = USigma V^TVf_i = USigma f_i = 0$ for $i$ = 4 .. 8



        And the inverse of $V$ is $V^T$.



        $Sigma$ jams $mathbb{R}^8$ into $mathbb{R}^5$by mapping the one-dimensional spaces spanned by each of $f_1, f_2, f_3$ onto those spanned by $e_1, e_2, e_3$ (scaling them by $sigma_1, sigma_2, sigma_3$ in the process) while squashing those spanned by $f_4..f_8$.



        It follows that $A$ jams $mathbb{R}^8$ into $mathbb{R}^5$ by mapping the one-dimensional spaces spanned by each of $Vf_1, Vf_2, Vf_3$ onto those spanned by $Ue_1, Ue_2, Ue_3$ (scaling them by $sigma_1, sigma_2, sigma_3$ in the process) while squashing those spanned by $Vf_4..Vf_8$.



        The key here is that restricted to the space spanned by $Vf_1, Vf_2, Vf_3$ A is an isomorphism onto the space spanned by $Ue_1, Ue_2, Ue_3$, and that $VSigma^+U^T$ is the inverse (when restricted to $Ue_1, Ue_2, Ue_3$).



        If we have $AX = b$ and $b in Im(A)$ it follows that there exists a unique $x' in <Vf_1, Vf_2, Vf_3>$ s.t. $Ax' = b$. Any other solution $x$ in $mathbb{R}^8$ takes the form $x' + delta$ for $delta in Ker(A)$. Now since we can decompose $mathbb{R}^8$ into $<Vf_1, Vf_2, Vf_3> oplus <Vf_4, .., Vf_8>$ we have $lvert xrvert^2 = <x' + delta, x' + delta> = lvert x'rvert^2 + lvert deltarvert^2$ and so $lvert xrvert >= lvert x'rvert$ - that is, $x'$ is the closest solution to the origin.






        share|cite|improve this answer
























          1












          1








          1






          Let's fix the dimensions for the sake of making the example simpler and say that $A:mathbb{R}^8tomathbb{R}^5$ and that $rank(A)=3$.



          $A=USigma V^T$ is the singular value decomposition of $A$ and $e_1, ..., e_5$ and $f_1, ..., f_8$ are the standard bases of $mathbb{R}^5$ and $mathbb{R}^8$ (respectively), i.e. $e_1 = [1, 0, 0, 0, 0]^T$ etc.



          So $A(mathbb{R}^8)$ is a 3-dimensional subspace of $mathbb{R}^5$. What are the (or some) geometric interpretations of $U$, $Sigma$ and $V$?



          $U$ sends the basis vectors $e_i$ to the column vectors $Ue_i$ of $U$ which give us an orthogonal basis in $mathbb{R}^5$ such that the first three column vectors span $Im(A)$. You can see this by noting that



          $USigma f_1 = Usigma_1 e_1 ne 0$



          $USigma f_2 = Usigma_2 e_2 ne 0$



          $USigma f_3 = Usigma_3 e_3 ne 0$



          $USigma f_4 = ... = USigma f_8 = 0$



          Since $U$ is orthogonal its inverse is $U^T$.



          Similarly $V$ sends the basis vectors $f_i$ to the column vectors $Vf_i$ of $V$ which give us an orthogonal basis in $mathbb{R}^8$ such that the last 5 span $ker(A)$:



          $AVf_i = USigma V^TVf_i = USigma f_i ne 0$ for $i$ = 1, 2, 3



          $AVf_i = USigma V^TVf_i = USigma f_i = 0$ for $i$ = 4 .. 8



          And the inverse of $V$ is $V^T$.



          $Sigma$ jams $mathbb{R}^8$ into $mathbb{R}^5$by mapping the one-dimensional spaces spanned by each of $f_1, f_2, f_3$ onto those spanned by $e_1, e_2, e_3$ (scaling them by $sigma_1, sigma_2, sigma_3$ in the process) while squashing those spanned by $f_4..f_8$.



          It follows that $A$ jams $mathbb{R}^8$ into $mathbb{R}^5$ by mapping the one-dimensional spaces spanned by each of $Vf_1, Vf_2, Vf_3$ onto those spanned by $Ue_1, Ue_2, Ue_3$ (scaling them by $sigma_1, sigma_2, sigma_3$ in the process) while squashing those spanned by $Vf_4..Vf_8$.



          The key here is that restricted to the space spanned by $Vf_1, Vf_2, Vf_3$ A is an isomorphism onto the space spanned by $Ue_1, Ue_2, Ue_3$, and that $VSigma^+U^T$ is the inverse (when restricted to $Ue_1, Ue_2, Ue_3$).



          If we have $AX = b$ and $b in Im(A)$ it follows that there exists a unique $x' in <Vf_1, Vf_2, Vf_3>$ s.t. $Ax' = b$. Any other solution $x$ in $mathbb{R}^8$ takes the form $x' + delta$ for $delta in Ker(A)$. Now since we can decompose $mathbb{R}^8$ into $<Vf_1, Vf_2, Vf_3> oplus <Vf_4, .., Vf_8>$ we have $lvert xrvert^2 = <x' + delta, x' + delta> = lvert x'rvert^2 + lvert deltarvert^2$ and so $lvert xrvert >= lvert x'rvert$ - that is, $x'$ is the closest solution to the origin.






          share|cite|improve this answer












          Let's fix the dimensions for the sake of making the example simpler and say that $A:mathbb{R}^8tomathbb{R}^5$ and that $rank(A)=3$.



          $A=USigma V^T$ is the singular value decomposition of $A$ and $e_1, ..., e_5$ and $f_1, ..., f_8$ are the standard bases of $mathbb{R}^5$ and $mathbb{R}^8$ (respectively), i.e. $e_1 = [1, 0, 0, 0, 0]^T$ etc.



          So $A(mathbb{R}^8)$ is a 3-dimensional subspace of $mathbb{R}^5$. What are the (or some) geometric interpretations of $U$, $Sigma$ and $V$?



          $U$ sends the basis vectors $e_i$ to the column vectors $Ue_i$ of $U$ which give us an orthogonal basis in $mathbb{R}^5$ such that the first three column vectors span $Im(A)$. You can see this by noting that



          $USigma f_1 = Usigma_1 e_1 ne 0$



          $USigma f_2 = Usigma_2 e_2 ne 0$



          $USigma f_3 = Usigma_3 e_3 ne 0$



          $USigma f_4 = ... = USigma f_8 = 0$



          Since $U$ is orthogonal its inverse is $U^T$.



          Similarly $V$ sends the basis vectors $f_i$ to the column vectors $Vf_i$ of $V$ which give us an orthogonal basis in $mathbb{R}^8$ such that the last 5 span $ker(A)$:



          $AVf_i = USigma V^TVf_i = USigma f_i ne 0$ for $i$ = 1, 2, 3



          $AVf_i = USigma V^TVf_i = USigma f_i = 0$ for $i$ = 4 .. 8



          And the inverse of $V$ is $V^T$.



          $Sigma$ jams $mathbb{R}^8$ into $mathbb{R}^5$by mapping the one-dimensional spaces spanned by each of $f_1, f_2, f_3$ onto those spanned by $e_1, e_2, e_3$ (scaling them by $sigma_1, sigma_2, sigma_3$ in the process) while squashing those spanned by $f_4..f_8$.



          It follows that $A$ jams $mathbb{R}^8$ into $mathbb{R}^5$ by mapping the one-dimensional spaces spanned by each of $Vf_1, Vf_2, Vf_3$ onto those spanned by $Ue_1, Ue_2, Ue_3$ (scaling them by $sigma_1, sigma_2, sigma_3$ in the process) while squashing those spanned by $Vf_4..Vf_8$.



          The key here is that restricted to the space spanned by $Vf_1, Vf_2, Vf_3$ A is an isomorphism onto the space spanned by $Ue_1, Ue_2, Ue_3$, and that $VSigma^+U^T$ is the inverse (when restricted to $Ue_1, Ue_2, Ue_3$).



          If we have $AX = b$ and $b in Im(A)$ it follows that there exists a unique $x' in <Vf_1, Vf_2, Vf_3>$ s.t. $Ax' = b$. Any other solution $x$ in $mathbb{R}^8$ takes the form $x' + delta$ for $delta in Ker(A)$. Now since we can decompose $mathbb{R}^8$ into $<Vf_1, Vf_2, Vf_3> oplus <Vf_4, .., Vf_8>$ we have $lvert xrvert^2 = <x' + delta, x' + delta> = lvert x'rvert^2 + lvert deltarvert^2$ and so $lvert xrvert >= lvert x'rvert$ - that is, $x'$ is the closest solution to the origin.







          share|cite|improve this answer












          share|cite|improve this answer



          share|cite|improve this answer










          answered Sep 30 '16 at 13:37









          Amos Joshua

          251110




          251110























              1














              The resource linked below really helped me understand this. The transformation $A$ can be interpreted in 2D as mapping the unit circle to an elipse. This can be done in a 3 step process using the SVD:




              1. Rotate the unit circle so it can be stretched along its axis

              2. Stretch each axis to form the ellipse

              3. Rotate again to align the ellipse with the output space of $A$


              To solve for $x$, you reverse this process, starting with $b$.



              Least squares comes in when step 2 creates a ellipse with a width of zero. When you're going through this process in reverse, when you get to step 2, un-stretching throws away that dimension with a width of zero. Still, you're left with the closest point to $b$ in the output space of $A$. Continuing through the reversed process gets you to $x'$.



              In other words, the transformation $A$ maps the unit circle to a line instead of an ellipse, and you've found the $x$ for which $Ax$ results in the closest point on that line to point $b$.



              https://www.cs.cornell.edu/courses/cs3220/2010sp/notes/svd.pdf






              share|cite|improve this answer


























                1














                The resource linked below really helped me understand this. The transformation $A$ can be interpreted in 2D as mapping the unit circle to an elipse. This can be done in a 3 step process using the SVD:




                1. Rotate the unit circle so it can be stretched along its axis

                2. Stretch each axis to form the ellipse

                3. Rotate again to align the ellipse with the output space of $A$


                To solve for $x$, you reverse this process, starting with $b$.



                Least squares comes in when step 2 creates a ellipse with a width of zero. When you're going through this process in reverse, when you get to step 2, un-stretching throws away that dimension with a width of zero. Still, you're left with the closest point to $b$ in the output space of $A$. Continuing through the reversed process gets you to $x'$.



                In other words, the transformation $A$ maps the unit circle to a line instead of an ellipse, and you've found the $x$ for which $Ax$ results in the closest point on that line to point $b$.



                https://www.cs.cornell.edu/courses/cs3220/2010sp/notes/svd.pdf






                share|cite|improve this answer
























                  1












                  1








                  1






                  The resource linked below really helped me understand this. The transformation $A$ can be interpreted in 2D as mapping the unit circle to an elipse. This can be done in a 3 step process using the SVD:




                  1. Rotate the unit circle so it can be stretched along its axis

                  2. Stretch each axis to form the ellipse

                  3. Rotate again to align the ellipse with the output space of $A$


                  To solve for $x$, you reverse this process, starting with $b$.



                  Least squares comes in when step 2 creates a ellipse with a width of zero. When you're going through this process in reverse, when you get to step 2, un-stretching throws away that dimension with a width of zero. Still, you're left with the closest point to $b$ in the output space of $A$. Continuing through the reversed process gets you to $x'$.



                  In other words, the transformation $A$ maps the unit circle to a line instead of an ellipse, and you've found the $x$ for which $Ax$ results in the closest point on that line to point $b$.



                  https://www.cs.cornell.edu/courses/cs3220/2010sp/notes/svd.pdf






                  share|cite|improve this answer












                  The resource linked below really helped me understand this. The transformation $A$ can be interpreted in 2D as mapping the unit circle to an elipse. This can be done in a 3 step process using the SVD:




                  1. Rotate the unit circle so it can be stretched along its axis

                  2. Stretch each axis to form the ellipse

                  3. Rotate again to align the ellipse with the output space of $A$


                  To solve for $x$, you reverse this process, starting with $b$.



                  Least squares comes in when step 2 creates a ellipse with a width of zero. When you're going through this process in reverse, when you get to step 2, un-stretching throws away that dimension with a width of zero. Still, you're left with the closest point to $b$ in the output space of $A$. Continuing through the reversed process gets you to $x'$.



                  In other words, the transformation $A$ maps the unit circle to a line instead of an ellipse, and you've found the $x$ for which $Ax$ results in the closest point on that line to point $b$.



                  https://www.cs.cornell.edu/courses/cs3220/2010sp/notes/svd.pdf







                  share|cite|improve this answer












                  share|cite|improve this answer



                  share|cite|improve this answer










                  answered Feb 25 '17 at 16:19









                  Chris Noyes

                  111




                  111






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Mathematics Stack Exchange!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      Use MathJax to format equations. MathJax reference.


                      To learn more, see our tips on writing great answers.





                      Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                      Please pay close attention to the following guidance:


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f974193%2fwhy-does-svd-provide-the-least-squares-and-least-norm-solution-to-a-x-b%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      MongoDB - Not Authorized To Execute Command

                      How to fix TextFormField cause rebuild widget in Flutter

                      in spring boot 2.1 many test slices are not allowed anymore due to multiple @BootstrapWith