Compute the gradient of $f(x)=|text{diag}(x)|$ with the chain rule












1












$begingroup$


Consider the function $f:mathbb{R}^ntomathbb{R}$ given by $f(x)=|text{diag}(x)|$, where $text{diag}(x)inmathbb{R}^{ntimes{n}}$ is the diagonal matrix with diagonal entries $x_1,x_2,dots,x_n$, and $|cdot|$ is the spectral norm (matrix 2-norm).



Since the spectral norm of a matrix is its largest singular value, and the singular values of a (square) diagonal matrix are the absolute values of the diagonal entries, we see that $f(x)=|x|_infty$, where $|cdot|_infty$ is the (vector) sup-norm. In this form, it is easier to deduce the properties of $f$--in particular, it is differentiable at any point $xinmathbb{R}^n$ where the largest element of $x$ (in absolute value) is unique. At such a point, the gradient of $f$ is given by
$$
nabla{f(x)}=text{sgn}(x_k)e_k
$$

where $k$ is the index of the (unique) largest entry of $x$ (in absolute value), $e_k$ is the $k^text{th}$ standard basis vector in $mathbb{R}^n$, and $text{sgn}(cdot)$ is the sign function.



I want to deduce the above expression for the gradient using the chain rule applied to $f(x)=(gcirc{h})(x)$, where $g:mathbb{R}^{ntimes{n}}tomathbb{R}$ is given by $g(A)=|A|$, and $h:mathbb{R}^ntomathbb{R}^{ntimes{n}}$ is given by $h(x)=text{diag}(x)$.



The "Jacobian" of $h$ is a three-dimensional object, where
$$
frac{partial[h(x)]_{ij}}{partial{x_k}}=begin{cases}1,&i=j=k,\0,&text{else.}end{cases}
$$

The function $g$ is differentiable at any point $A$ where $A$ has a unique largest singular value, in which case the gradient(?) is given by
$$
nabla{g(A)}=uv^text{T},
$$

where $u$ and $v$ are the left and right singular vectors (respectively) corresponding to the (unique) largest singular value of $A$.



So I essentially have a three-dimensional object and a 2-dimensional object, and I want to apply the chain rule to get the gradient, a 1-dimensional object (i.e. a vector). A straightforward application suggests "multiplying them together" (not sure that concept is even defined), which seems like it would produce a matrix. What simple thing am I missing here?










share|cite|improve this question









$endgroup$

















    1












    $begingroup$


    Consider the function $f:mathbb{R}^ntomathbb{R}$ given by $f(x)=|text{diag}(x)|$, where $text{diag}(x)inmathbb{R}^{ntimes{n}}$ is the diagonal matrix with diagonal entries $x_1,x_2,dots,x_n$, and $|cdot|$ is the spectral norm (matrix 2-norm).



    Since the spectral norm of a matrix is its largest singular value, and the singular values of a (square) diagonal matrix are the absolute values of the diagonal entries, we see that $f(x)=|x|_infty$, where $|cdot|_infty$ is the (vector) sup-norm. In this form, it is easier to deduce the properties of $f$--in particular, it is differentiable at any point $xinmathbb{R}^n$ where the largest element of $x$ (in absolute value) is unique. At such a point, the gradient of $f$ is given by
    $$
    nabla{f(x)}=text{sgn}(x_k)e_k
    $$

    where $k$ is the index of the (unique) largest entry of $x$ (in absolute value), $e_k$ is the $k^text{th}$ standard basis vector in $mathbb{R}^n$, and $text{sgn}(cdot)$ is the sign function.



    I want to deduce the above expression for the gradient using the chain rule applied to $f(x)=(gcirc{h})(x)$, where $g:mathbb{R}^{ntimes{n}}tomathbb{R}$ is given by $g(A)=|A|$, and $h:mathbb{R}^ntomathbb{R}^{ntimes{n}}$ is given by $h(x)=text{diag}(x)$.



    The "Jacobian" of $h$ is a three-dimensional object, where
    $$
    frac{partial[h(x)]_{ij}}{partial{x_k}}=begin{cases}1,&i=j=k,\0,&text{else.}end{cases}
    $$

    The function $g$ is differentiable at any point $A$ where $A$ has a unique largest singular value, in which case the gradient(?) is given by
    $$
    nabla{g(A)}=uv^text{T},
    $$

    where $u$ and $v$ are the left and right singular vectors (respectively) corresponding to the (unique) largest singular value of $A$.



    So I essentially have a three-dimensional object and a 2-dimensional object, and I want to apply the chain rule to get the gradient, a 1-dimensional object (i.e. a vector). A straightforward application suggests "multiplying them together" (not sure that concept is even defined), which seems like it would produce a matrix. What simple thing am I missing here?










    share|cite|improve this question









    $endgroup$















      1












      1








      1





      $begingroup$


      Consider the function $f:mathbb{R}^ntomathbb{R}$ given by $f(x)=|text{diag}(x)|$, where $text{diag}(x)inmathbb{R}^{ntimes{n}}$ is the diagonal matrix with diagonal entries $x_1,x_2,dots,x_n$, and $|cdot|$ is the spectral norm (matrix 2-norm).



      Since the spectral norm of a matrix is its largest singular value, and the singular values of a (square) diagonal matrix are the absolute values of the diagonal entries, we see that $f(x)=|x|_infty$, where $|cdot|_infty$ is the (vector) sup-norm. In this form, it is easier to deduce the properties of $f$--in particular, it is differentiable at any point $xinmathbb{R}^n$ where the largest element of $x$ (in absolute value) is unique. At such a point, the gradient of $f$ is given by
      $$
      nabla{f(x)}=text{sgn}(x_k)e_k
      $$

      where $k$ is the index of the (unique) largest entry of $x$ (in absolute value), $e_k$ is the $k^text{th}$ standard basis vector in $mathbb{R}^n$, and $text{sgn}(cdot)$ is the sign function.



      I want to deduce the above expression for the gradient using the chain rule applied to $f(x)=(gcirc{h})(x)$, where $g:mathbb{R}^{ntimes{n}}tomathbb{R}$ is given by $g(A)=|A|$, and $h:mathbb{R}^ntomathbb{R}^{ntimes{n}}$ is given by $h(x)=text{diag}(x)$.



      The "Jacobian" of $h$ is a three-dimensional object, where
      $$
      frac{partial[h(x)]_{ij}}{partial{x_k}}=begin{cases}1,&i=j=k,\0,&text{else.}end{cases}
      $$

      The function $g$ is differentiable at any point $A$ where $A$ has a unique largest singular value, in which case the gradient(?) is given by
      $$
      nabla{g(A)}=uv^text{T},
      $$

      where $u$ and $v$ are the left and right singular vectors (respectively) corresponding to the (unique) largest singular value of $A$.



      So I essentially have a three-dimensional object and a 2-dimensional object, and I want to apply the chain rule to get the gradient, a 1-dimensional object (i.e. a vector). A straightforward application suggests "multiplying them together" (not sure that concept is even defined), which seems like it would produce a matrix. What simple thing am I missing here?










      share|cite|improve this question









      $endgroup$




      Consider the function $f:mathbb{R}^ntomathbb{R}$ given by $f(x)=|text{diag}(x)|$, where $text{diag}(x)inmathbb{R}^{ntimes{n}}$ is the diagonal matrix with diagonal entries $x_1,x_2,dots,x_n$, and $|cdot|$ is the spectral norm (matrix 2-norm).



      Since the spectral norm of a matrix is its largest singular value, and the singular values of a (square) diagonal matrix are the absolute values of the diagonal entries, we see that $f(x)=|x|_infty$, where $|cdot|_infty$ is the (vector) sup-norm. In this form, it is easier to deduce the properties of $f$--in particular, it is differentiable at any point $xinmathbb{R}^n$ where the largest element of $x$ (in absolute value) is unique. At such a point, the gradient of $f$ is given by
      $$
      nabla{f(x)}=text{sgn}(x_k)e_k
      $$

      where $k$ is the index of the (unique) largest entry of $x$ (in absolute value), $e_k$ is the $k^text{th}$ standard basis vector in $mathbb{R}^n$, and $text{sgn}(cdot)$ is the sign function.



      I want to deduce the above expression for the gradient using the chain rule applied to $f(x)=(gcirc{h})(x)$, where $g:mathbb{R}^{ntimes{n}}tomathbb{R}$ is given by $g(A)=|A|$, and $h:mathbb{R}^ntomathbb{R}^{ntimes{n}}$ is given by $h(x)=text{diag}(x)$.



      The "Jacobian" of $h$ is a three-dimensional object, where
      $$
      frac{partial[h(x)]_{ij}}{partial{x_k}}=begin{cases}1,&i=j=k,\0,&text{else.}end{cases}
      $$

      The function $g$ is differentiable at any point $A$ where $A$ has a unique largest singular value, in which case the gradient(?) is given by
      $$
      nabla{g(A)}=uv^text{T},
      $$

      where $u$ and $v$ are the left and right singular vectors (respectively) corresponding to the (unique) largest singular value of $A$.



      So I essentially have a three-dimensional object and a 2-dimensional object, and I want to apply the chain rule to get the gradient, a 1-dimensional object (i.e. a vector). A straightforward application suggests "multiplying them together" (not sure that concept is even defined), which seems like it would produce a matrix. What simple thing am I missing here?







      derivatives matrix-calculus chain-rule






      share|cite|improve this question













      share|cite|improve this question











      share|cite|improve this question




      share|cite|improve this question










      asked Jan 22 at 1:23









      David M.David M.

      1,856418




      1,856418






















          1 Answer
          1






          active

          oldest

          votes


















          2












          $begingroup$

          A hint rather than a complete answer



          It might help to think of $Bbb R^{n times n}$ as $Bbb R^{n^2}$.



          Now $h$ is a function from $Bbb R^n$ to $Bbb R^{n^2}$ and $g$ is a function from
          $Bbb R^{n^2}$ to $Bbb R$, so the composite goes from $Bbb R^n$ to $Bbb R$. Can you work out the $k$th partial derivative of this?



          When you do, you may find out that the result you get is surprisingly closely related to matrix multiplication (of some sort) of the derivatives of $h$ and $g$...or you may not.



          Just to get you started, $$nabla g(A)_{ni + j} = u_i v_j,$$
          where I'm working with indices that go from $0$ to $n-1$ here.



          Can you now work out the $ni + j$th entry of the $k$th partial derivative of $g circ h$?






          share|cite|improve this answer









          $endgroup$













          • $begingroup$
            Thought I might have to resort to this--was being lazy and trying to avoid "index counting" from stacking matrices into vectors. Indeed this works!
            $endgroup$
            – David M.
            Jan 22 at 18:53










          • $begingroup$
            And when you were done, did you discover that there was a matrix multiplication hidden in there? (I'm asking seriously...I didn't bother doing it myself, and wonder what the result was...)
            $endgroup$
            – John Hughes
            Jan 22 at 19:24










          • $begingroup$
            Yes it works out following the conventions of the usual multi-dimensional chain rule. The resulting gradient vector is $J_g^text{T}{J_h}$, where $J_ginmathbb{R}^{1times{n^2}}$ is the outer product of $u$ and $v$ stacked columnwise, and $J_hinmathbb{R}^{n^2times{n}}$, where each row of $J_h$ is the gradient of a component of the matrix $text{diag}(x)$ stacked columnwise (lots of words, but basically what you would expect)
            $endgroup$
            – David M.
            Jan 22 at 19:27













          Your Answer





          StackExchange.ifUsing("editor", function () {
          return StackExchange.using("mathjaxEditing", function () {
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
          });
          });
          }, "mathjax-editing");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "69"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          noCode: true, onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3082644%2fcompute-the-gradient-of-fx-textdiagx-with-the-chain-rule%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          2












          $begingroup$

          A hint rather than a complete answer



          It might help to think of $Bbb R^{n times n}$ as $Bbb R^{n^2}$.



          Now $h$ is a function from $Bbb R^n$ to $Bbb R^{n^2}$ and $g$ is a function from
          $Bbb R^{n^2}$ to $Bbb R$, so the composite goes from $Bbb R^n$ to $Bbb R$. Can you work out the $k$th partial derivative of this?



          When you do, you may find out that the result you get is surprisingly closely related to matrix multiplication (of some sort) of the derivatives of $h$ and $g$...or you may not.



          Just to get you started, $$nabla g(A)_{ni + j} = u_i v_j,$$
          where I'm working with indices that go from $0$ to $n-1$ here.



          Can you now work out the $ni + j$th entry of the $k$th partial derivative of $g circ h$?






          share|cite|improve this answer









          $endgroup$













          • $begingroup$
            Thought I might have to resort to this--was being lazy and trying to avoid "index counting" from stacking matrices into vectors. Indeed this works!
            $endgroup$
            – David M.
            Jan 22 at 18:53










          • $begingroup$
            And when you were done, did you discover that there was a matrix multiplication hidden in there? (I'm asking seriously...I didn't bother doing it myself, and wonder what the result was...)
            $endgroup$
            – John Hughes
            Jan 22 at 19:24










          • $begingroup$
            Yes it works out following the conventions of the usual multi-dimensional chain rule. The resulting gradient vector is $J_g^text{T}{J_h}$, where $J_ginmathbb{R}^{1times{n^2}}$ is the outer product of $u$ and $v$ stacked columnwise, and $J_hinmathbb{R}^{n^2times{n}}$, where each row of $J_h$ is the gradient of a component of the matrix $text{diag}(x)$ stacked columnwise (lots of words, but basically what you would expect)
            $endgroup$
            – David M.
            Jan 22 at 19:27


















          2












          $begingroup$

          A hint rather than a complete answer



          It might help to think of $Bbb R^{n times n}$ as $Bbb R^{n^2}$.



          Now $h$ is a function from $Bbb R^n$ to $Bbb R^{n^2}$ and $g$ is a function from
          $Bbb R^{n^2}$ to $Bbb R$, so the composite goes from $Bbb R^n$ to $Bbb R$. Can you work out the $k$th partial derivative of this?



          When you do, you may find out that the result you get is surprisingly closely related to matrix multiplication (of some sort) of the derivatives of $h$ and $g$...or you may not.



          Just to get you started, $$nabla g(A)_{ni + j} = u_i v_j,$$
          where I'm working with indices that go from $0$ to $n-1$ here.



          Can you now work out the $ni + j$th entry of the $k$th partial derivative of $g circ h$?






          share|cite|improve this answer









          $endgroup$













          • $begingroup$
            Thought I might have to resort to this--was being lazy and trying to avoid "index counting" from stacking matrices into vectors. Indeed this works!
            $endgroup$
            – David M.
            Jan 22 at 18:53










          • $begingroup$
            And when you were done, did you discover that there was a matrix multiplication hidden in there? (I'm asking seriously...I didn't bother doing it myself, and wonder what the result was...)
            $endgroup$
            – John Hughes
            Jan 22 at 19:24










          • $begingroup$
            Yes it works out following the conventions of the usual multi-dimensional chain rule. The resulting gradient vector is $J_g^text{T}{J_h}$, where $J_ginmathbb{R}^{1times{n^2}}$ is the outer product of $u$ and $v$ stacked columnwise, and $J_hinmathbb{R}^{n^2times{n}}$, where each row of $J_h$ is the gradient of a component of the matrix $text{diag}(x)$ stacked columnwise (lots of words, but basically what you would expect)
            $endgroup$
            – David M.
            Jan 22 at 19:27
















          2












          2








          2





          $begingroup$

          A hint rather than a complete answer



          It might help to think of $Bbb R^{n times n}$ as $Bbb R^{n^2}$.



          Now $h$ is a function from $Bbb R^n$ to $Bbb R^{n^2}$ and $g$ is a function from
          $Bbb R^{n^2}$ to $Bbb R$, so the composite goes from $Bbb R^n$ to $Bbb R$. Can you work out the $k$th partial derivative of this?



          When you do, you may find out that the result you get is surprisingly closely related to matrix multiplication (of some sort) of the derivatives of $h$ and $g$...or you may not.



          Just to get you started, $$nabla g(A)_{ni + j} = u_i v_j,$$
          where I'm working with indices that go from $0$ to $n-1$ here.



          Can you now work out the $ni + j$th entry of the $k$th partial derivative of $g circ h$?






          share|cite|improve this answer









          $endgroup$



          A hint rather than a complete answer



          It might help to think of $Bbb R^{n times n}$ as $Bbb R^{n^2}$.



          Now $h$ is a function from $Bbb R^n$ to $Bbb R^{n^2}$ and $g$ is a function from
          $Bbb R^{n^2}$ to $Bbb R$, so the composite goes from $Bbb R^n$ to $Bbb R$. Can you work out the $k$th partial derivative of this?



          When you do, you may find out that the result you get is surprisingly closely related to matrix multiplication (of some sort) of the derivatives of $h$ and $g$...or you may not.



          Just to get you started, $$nabla g(A)_{ni + j} = u_i v_j,$$
          where I'm working with indices that go from $0$ to $n-1$ here.



          Can you now work out the $ni + j$th entry of the $k$th partial derivative of $g circ h$?







          share|cite|improve this answer












          share|cite|improve this answer



          share|cite|improve this answer










          answered Jan 22 at 1:40









          John HughesJohn Hughes

          64.4k24191




          64.4k24191












          • $begingroup$
            Thought I might have to resort to this--was being lazy and trying to avoid "index counting" from stacking matrices into vectors. Indeed this works!
            $endgroup$
            – David M.
            Jan 22 at 18:53










          • $begingroup$
            And when you were done, did you discover that there was a matrix multiplication hidden in there? (I'm asking seriously...I didn't bother doing it myself, and wonder what the result was...)
            $endgroup$
            – John Hughes
            Jan 22 at 19:24










          • $begingroup$
            Yes it works out following the conventions of the usual multi-dimensional chain rule. The resulting gradient vector is $J_g^text{T}{J_h}$, where $J_ginmathbb{R}^{1times{n^2}}$ is the outer product of $u$ and $v$ stacked columnwise, and $J_hinmathbb{R}^{n^2times{n}}$, where each row of $J_h$ is the gradient of a component of the matrix $text{diag}(x)$ stacked columnwise (lots of words, but basically what you would expect)
            $endgroup$
            – David M.
            Jan 22 at 19:27




















          • $begingroup$
            Thought I might have to resort to this--was being lazy and trying to avoid "index counting" from stacking matrices into vectors. Indeed this works!
            $endgroup$
            – David M.
            Jan 22 at 18:53










          • $begingroup$
            And when you were done, did you discover that there was a matrix multiplication hidden in there? (I'm asking seriously...I didn't bother doing it myself, and wonder what the result was...)
            $endgroup$
            – John Hughes
            Jan 22 at 19:24










          • $begingroup$
            Yes it works out following the conventions of the usual multi-dimensional chain rule. The resulting gradient vector is $J_g^text{T}{J_h}$, where $J_ginmathbb{R}^{1times{n^2}}$ is the outer product of $u$ and $v$ stacked columnwise, and $J_hinmathbb{R}^{n^2times{n}}$, where each row of $J_h$ is the gradient of a component of the matrix $text{diag}(x)$ stacked columnwise (lots of words, but basically what you would expect)
            $endgroup$
            – David M.
            Jan 22 at 19:27


















          $begingroup$
          Thought I might have to resort to this--was being lazy and trying to avoid "index counting" from stacking matrices into vectors. Indeed this works!
          $endgroup$
          – David M.
          Jan 22 at 18:53




          $begingroup$
          Thought I might have to resort to this--was being lazy and trying to avoid "index counting" from stacking matrices into vectors. Indeed this works!
          $endgroup$
          – David M.
          Jan 22 at 18:53












          $begingroup$
          And when you were done, did you discover that there was a matrix multiplication hidden in there? (I'm asking seriously...I didn't bother doing it myself, and wonder what the result was...)
          $endgroup$
          – John Hughes
          Jan 22 at 19:24




          $begingroup$
          And when you were done, did you discover that there was a matrix multiplication hidden in there? (I'm asking seriously...I didn't bother doing it myself, and wonder what the result was...)
          $endgroup$
          – John Hughes
          Jan 22 at 19:24












          $begingroup$
          Yes it works out following the conventions of the usual multi-dimensional chain rule. The resulting gradient vector is $J_g^text{T}{J_h}$, where $J_ginmathbb{R}^{1times{n^2}}$ is the outer product of $u$ and $v$ stacked columnwise, and $J_hinmathbb{R}^{n^2times{n}}$, where each row of $J_h$ is the gradient of a component of the matrix $text{diag}(x)$ stacked columnwise (lots of words, but basically what you would expect)
          $endgroup$
          – David M.
          Jan 22 at 19:27






          $begingroup$
          Yes it works out following the conventions of the usual multi-dimensional chain rule. The resulting gradient vector is $J_g^text{T}{J_h}$, where $J_ginmathbb{R}^{1times{n^2}}$ is the outer product of $u$ and $v$ stacked columnwise, and $J_hinmathbb{R}^{n^2times{n}}$, where each row of $J_h$ is the gradient of a component of the matrix $text{diag}(x)$ stacked columnwise (lots of words, but basically what you would expect)
          $endgroup$
          – David M.
          Jan 22 at 19:27




















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Mathematics Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3082644%2fcompute-the-gradient-of-fx-textdiagx-with-the-chain-rule%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          MongoDB - Not Authorized To Execute Command

          in spring boot 2.1 many test slices are not allowed anymore due to multiple @BootstrapWith

          How to fix TextFormField cause rebuild widget in Flutter