Is there a closed-form formula for the derivative of the orthogonal polar factor of a matrix?












0












$begingroup$


$newcommand{psym}{text{Psym}_n}$
$newcommand{sym}{text{sym}}$
$newcommand{Sym}{operatorname{Sym}}$
$newcommand{Skew}{operatorname{Skew}}$
$newcommand{SO}{operatorname{SO}_n}$
$renewcommand{skew}{operatorname{skew}}$
$newcommand{GLp}{operatorname{GL}_n^+}$



Let $psym$ be the space of real symmetric positive-definite $n times n$ matrices, and $GLp$ be the group of real $n times n$ invertible matrices with positive determinant.



Let $O:GLp to SO$ be the orthogonal polar factor map, defined by requiring $A=
O(A)P$
for some symmetric positive-definite $P$. Note that $O(A)=A(sqrt{A^TA})^{-1}$.




Is there a nice "closed-form algebraic formula" for the differential $dO_A$? If not, perhaps there is a formula for $langle dO_A(B),C rangle $? (similarly to what happens with the Levi-Civita connection, where we have an implicit characterization of $nabla_XY$ in terms of the Koszul formula).




I am fine with using positive matrix square roots and inverses, but not with using integral formulas or vectorization operations like here or here. (I also don't want to use explicitly the singular values of $A$).



Here are some partial results:




  1. Since $dO_{QA}(QB)=QdO_A(B)$ (and $dO_{AQ}(BQ)=dO_A(B)Q$), the question can be reduced to the case where $A in psym$. (The "dual" orthogonal case is easy: $dO_{Id}(B)=skew(B)$, and for every $Q in SO$, $dO_{Q}(QB)=Qskew B$).


  2. For every $B in skew, dO_A(OBP)=OB$, i.e. $dO_A(X)=XP^{-1}$ if $O^TXP^{-1} in skew$: Set $alpha(t)=Oe^{tB}P$. Then $O(alpha(t))=Oe^{tB}$, so $dO_A(OBP)=OB$.



The problem with calculating $dO_A(OBP)$ when $B in sym$, is that $e^{tB}P$ does not need to be symmetric, even though $e^{tB},P$ are both positive-definite. If this was the case, then it would imply $dO_A(OBP)=Oskew B$, which is false in general (see below).





One could conjecture that perhaps $dO_A(OBP)=OB$ holds also for $B in sym$, or equivalently that $dO_A(X)=XP^{-1}$ for every $X in M_n$. This is false since $dO_A$ is not injective, due to dimensional incompatibility.




Another possible conjecture is $dO_A(OBP)=Oskew B$. However, this is also false:




Indeed, for $A=P in psym$, this reduces to $dO_P(BP)=skew B$. Suppose that $C:=BP in sym$. Then must have $dO_P(BP)=0$. Indeed, by differentiating $A=OP$ we obtain
$$ dot A=dot O P+Odot P, tag{1}$$



which for $A=P,O=Id$ becomes $ dot A=dot O P+dot P$. Note that $C in sym Rightarrow dP_P(C)=C$ (since $P_{psym}=Id_{psym}$ and $C in T_P{psym}$), i.e. $dot P=dot A$, which implies $dot O=0$.



Thus, we proved that $BP in sym$ implies $dO_P(BP)=0$. However, this is incompatible with $dO_P(BP)=skew B$ in general: $BP in sym iff BP=PB^T$.



For $P=text{diag}(sigma_1,sigma_2), B=begin{pmatrix} 1 & b \ c & 1 end{pmatrix}$, this happens if and only if $sigma_2b=sigma_1c$, so if $sigma_1 neq sigma_2$, then $B$ is not symmetric. Thus, $dO_P(BP)=0 neq skew B$.





Comment: Equation $(1)$ implies that one can equivalently focus upon the "dual" problem, of computing $dP_A$, instead of $dO_A$. Here is a previous attempt of mine to go in this direction.










share|cite|improve this question









$endgroup$

















    0












    $begingroup$


    $newcommand{psym}{text{Psym}_n}$
    $newcommand{sym}{text{sym}}$
    $newcommand{Sym}{operatorname{Sym}}$
    $newcommand{Skew}{operatorname{Skew}}$
    $newcommand{SO}{operatorname{SO}_n}$
    $renewcommand{skew}{operatorname{skew}}$
    $newcommand{GLp}{operatorname{GL}_n^+}$



    Let $psym$ be the space of real symmetric positive-definite $n times n$ matrices, and $GLp$ be the group of real $n times n$ invertible matrices with positive determinant.



    Let $O:GLp to SO$ be the orthogonal polar factor map, defined by requiring $A=
    O(A)P$
    for some symmetric positive-definite $P$. Note that $O(A)=A(sqrt{A^TA})^{-1}$.




    Is there a nice "closed-form algebraic formula" for the differential $dO_A$? If not, perhaps there is a formula for $langle dO_A(B),C rangle $? (similarly to what happens with the Levi-Civita connection, where we have an implicit characterization of $nabla_XY$ in terms of the Koszul formula).




    I am fine with using positive matrix square roots and inverses, but not with using integral formulas or vectorization operations like here or here. (I also don't want to use explicitly the singular values of $A$).



    Here are some partial results:




    1. Since $dO_{QA}(QB)=QdO_A(B)$ (and $dO_{AQ}(BQ)=dO_A(B)Q$), the question can be reduced to the case where $A in psym$. (The "dual" orthogonal case is easy: $dO_{Id}(B)=skew(B)$, and for every $Q in SO$, $dO_{Q}(QB)=Qskew B$).


    2. For every $B in skew, dO_A(OBP)=OB$, i.e. $dO_A(X)=XP^{-1}$ if $O^TXP^{-1} in skew$: Set $alpha(t)=Oe^{tB}P$. Then $O(alpha(t))=Oe^{tB}$, so $dO_A(OBP)=OB$.



    The problem with calculating $dO_A(OBP)$ when $B in sym$, is that $e^{tB}P$ does not need to be symmetric, even though $e^{tB},P$ are both positive-definite. If this was the case, then it would imply $dO_A(OBP)=Oskew B$, which is false in general (see below).





    One could conjecture that perhaps $dO_A(OBP)=OB$ holds also for $B in sym$, or equivalently that $dO_A(X)=XP^{-1}$ for every $X in M_n$. This is false since $dO_A$ is not injective, due to dimensional incompatibility.




    Another possible conjecture is $dO_A(OBP)=Oskew B$. However, this is also false:




    Indeed, for $A=P in psym$, this reduces to $dO_P(BP)=skew B$. Suppose that $C:=BP in sym$. Then must have $dO_P(BP)=0$. Indeed, by differentiating $A=OP$ we obtain
    $$ dot A=dot O P+Odot P, tag{1}$$



    which for $A=P,O=Id$ becomes $ dot A=dot O P+dot P$. Note that $C in sym Rightarrow dP_P(C)=C$ (since $P_{psym}=Id_{psym}$ and $C in T_P{psym}$), i.e. $dot P=dot A$, which implies $dot O=0$.



    Thus, we proved that $BP in sym$ implies $dO_P(BP)=0$. However, this is incompatible with $dO_P(BP)=skew B$ in general: $BP in sym iff BP=PB^T$.



    For $P=text{diag}(sigma_1,sigma_2), B=begin{pmatrix} 1 & b \ c & 1 end{pmatrix}$, this happens if and only if $sigma_2b=sigma_1c$, so if $sigma_1 neq sigma_2$, then $B$ is not symmetric. Thus, $dO_P(BP)=0 neq skew B$.





    Comment: Equation $(1)$ implies that one can equivalently focus upon the "dual" problem, of computing $dP_A$, instead of $dO_A$. Here is a previous attempt of mine to go in this direction.










    share|cite|improve this question









    $endgroup$















      0












      0








      0





      $begingroup$


      $newcommand{psym}{text{Psym}_n}$
      $newcommand{sym}{text{sym}}$
      $newcommand{Sym}{operatorname{Sym}}$
      $newcommand{Skew}{operatorname{Skew}}$
      $newcommand{SO}{operatorname{SO}_n}$
      $renewcommand{skew}{operatorname{skew}}$
      $newcommand{GLp}{operatorname{GL}_n^+}$



      Let $psym$ be the space of real symmetric positive-definite $n times n$ matrices, and $GLp$ be the group of real $n times n$ invertible matrices with positive determinant.



      Let $O:GLp to SO$ be the orthogonal polar factor map, defined by requiring $A=
      O(A)P$
      for some symmetric positive-definite $P$. Note that $O(A)=A(sqrt{A^TA})^{-1}$.




      Is there a nice "closed-form algebraic formula" for the differential $dO_A$? If not, perhaps there is a formula for $langle dO_A(B),C rangle $? (similarly to what happens with the Levi-Civita connection, where we have an implicit characterization of $nabla_XY$ in terms of the Koszul formula).




      I am fine with using positive matrix square roots and inverses, but not with using integral formulas or vectorization operations like here or here. (I also don't want to use explicitly the singular values of $A$).



      Here are some partial results:




      1. Since $dO_{QA}(QB)=QdO_A(B)$ (and $dO_{AQ}(BQ)=dO_A(B)Q$), the question can be reduced to the case where $A in psym$. (The "dual" orthogonal case is easy: $dO_{Id}(B)=skew(B)$, and for every $Q in SO$, $dO_{Q}(QB)=Qskew B$).


      2. For every $B in skew, dO_A(OBP)=OB$, i.e. $dO_A(X)=XP^{-1}$ if $O^TXP^{-1} in skew$: Set $alpha(t)=Oe^{tB}P$. Then $O(alpha(t))=Oe^{tB}$, so $dO_A(OBP)=OB$.



      The problem with calculating $dO_A(OBP)$ when $B in sym$, is that $e^{tB}P$ does not need to be symmetric, even though $e^{tB},P$ are both positive-definite. If this was the case, then it would imply $dO_A(OBP)=Oskew B$, which is false in general (see below).





      One could conjecture that perhaps $dO_A(OBP)=OB$ holds also for $B in sym$, or equivalently that $dO_A(X)=XP^{-1}$ for every $X in M_n$. This is false since $dO_A$ is not injective, due to dimensional incompatibility.




      Another possible conjecture is $dO_A(OBP)=Oskew B$. However, this is also false:




      Indeed, for $A=P in psym$, this reduces to $dO_P(BP)=skew B$. Suppose that $C:=BP in sym$. Then must have $dO_P(BP)=0$. Indeed, by differentiating $A=OP$ we obtain
      $$ dot A=dot O P+Odot P, tag{1}$$



      which for $A=P,O=Id$ becomes $ dot A=dot O P+dot P$. Note that $C in sym Rightarrow dP_P(C)=C$ (since $P_{psym}=Id_{psym}$ and $C in T_P{psym}$), i.e. $dot P=dot A$, which implies $dot O=0$.



      Thus, we proved that $BP in sym$ implies $dO_P(BP)=0$. However, this is incompatible with $dO_P(BP)=skew B$ in general: $BP in sym iff BP=PB^T$.



      For $P=text{diag}(sigma_1,sigma_2), B=begin{pmatrix} 1 & b \ c & 1 end{pmatrix}$, this happens if and only if $sigma_2b=sigma_1c$, so if $sigma_1 neq sigma_2$, then $B$ is not symmetric. Thus, $dO_P(BP)=0 neq skew B$.





      Comment: Equation $(1)$ implies that one can equivalently focus upon the "dual" problem, of computing $dP_A$, instead of $dO_A$. Here is a previous attempt of mine to go in this direction.










      share|cite|improve this question









      $endgroup$




      $newcommand{psym}{text{Psym}_n}$
      $newcommand{sym}{text{sym}}$
      $newcommand{Sym}{operatorname{Sym}}$
      $newcommand{Skew}{operatorname{Skew}}$
      $newcommand{SO}{operatorname{SO}_n}$
      $renewcommand{skew}{operatorname{skew}}$
      $newcommand{GLp}{operatorname{GL}_n^+}$



      Let $psym$ be the space of real symmetric positive-definite $n times n$ matrices, and $GLp$ be the group of real $n times n$ invertible matrices with positive determinant.



      Let $O:GLp to SO$ be the orthogonal polar factor map, defined by requiring $A=
      O(A)P$
      for some symmetric positive-definite $P$. Note that $O(A)=A(sqrt{A^TA})^{-1}$.




      Is there a nice "closed-form algebraic formula" for the differential $dO_A$? If not, perhaps there is a formula for $langle dO_A(B),C rangle $? (similarly to what happens with the Levi-Civita connection, where we have an implicit characterization of $nabla_XY$ in terms of the Koszul formula).




      I am fine with using positive matrix square roots and inverses, but not with using integral formulas or vectorization operations like here or here. (I also don't want to use explicitly the singular values of $A$).



      Here are some partial results:




      1. Since $dO_{QA}(QB)=QdO_A(B)$ (and $dO_{AQ}(BQ)=dO_A(B)Q$), the question can be reduced to the case where $A in psym$. (The "dual" orthogonal case is easy: $dO_{Id}(B)=skew(B)$, and for every $Q in SO$, $dO_{Q}(QB)=Qskew B$).


      2. For every $B in skew, dO_A(OBP)=OB$, i.e. $dO_A(X)=XP^{-1}$ if $O^TXP^{-1} in skew$: Set $alpha(t)=Oe^{tB}P$. Then $O(alpha(t))=Oe^{tB}$, so $dO_A(OBP)=OB$.



      The problem with calculating $dO_A(OBP)$ when $B in sym$, is that $e^{tB}P$ does not need to be symmetric, even though $e^{tB},P$ are both positive-definite. If this was the case, then it would imply $dO_A(OBP)=Oskew B$, which is false in general (see below).





      One could conjecture that perhaps $dO_A(OBP)=OB$ holds also for $B in sym$, or equivalently that $dO_A(X)=XP^{-1}$ for every $X in M_n$. This is false since $dO_A$ is not injective, due to dimensional incompatibility.




      Another possible conjecture is $dO_A(OBP)=Oskew B$. However, this is also false:




      Indeed, for $A=P in psym$, this reduces to $dO_P(BP)=skew B$. Suppose that $C:=BP in sym$. Then must have $dO_P(BP)=0$. Indeed, by differentiating $A=OP$ we obtain
      $$ dot A=dot O P+Odot P, tag{1}$$



      which for $A=P,O=Id$ becomes $ dot A=dot O P+dot P$. Note that $C in sym Rightarrow dP_P(C)=C$ (since $P_{psym}=Id_{psym}$ and $C in T_P{psym}$), i.e. $dot P=dot A$, which implies $dot O=0$.



      Thus, we proved that $BP in sym$ implies $dO_P(BP)=0$. However, this is incompatible with $dO_P(BP)=skew B$ in general: $BP in sym iff BP=PB^T$.



      For $P=text{diag}(sigma_1,sigma_2), B=begin{pmatrix} 1 & b \ c & 1 end{pmatrix}$, this happens if and only if $sigma_2b=sigma_1c$, so if $sigma_1 neq sigma_2$, then $B$ is not symmetric. Thus, $dO_P(BP)=0 neq skew B$.





      Comment: Equation $(1)$ implies that one can equivalently focus upon the "dual" problem, of computing $dP_A$, instead of $dO_A$. Here is a previous attempt of mine to go in this direction.







      soft-question closed-form matrix-calculus matrix-decomposition orthogonal-matrices






      share|cite|improve this question













      share|cite|improve this question











      share|cite|improve this question




      share|cite|improve this question










      asked Jan 17 at 10:28









      Asaf ShacharAsaf Shachar

      5,64631141




      5,64631141






















          1 Answer
          1






          active

          oldest

          votes


















          0












          $begingroup$

          Define the matrices
          $$eqalign{
          F &= A(A^TA)^{-1/2} cr
          Q &= I-AA^+ = ,Q^T cr
          V^T &= A^+ cr
          }$$

          where $A^+$ denotes the pseudoinverse.



          Then
          $$eqalign{
          FF^T &= A(A^TA)^{-1}A^T cr&= AA^+ cr
          d(FF^T) &= d(AA^+) cr
          dF,F^T+F,dF^T &= Q,dA,A^+ + V,dA^T,Q cr
          }$$

          At this point, you have stressed that you don't want to use vectorization, so let's try solving it using 4th order tensors.



          There are 3 isotropic tensors $({mathcal H},{mathcal J},{mathcal K})$ and their components can be expressed in terms of Kronecker symbols.
          $$eqalign{
          {mathcal H}_{ijkl} &= delta_{ik},delta_{jl} cr
          {mathcal J}_{ijkl} &= delta_{ij},delta_{kl} cr
          {mathcal K}_{ijkl} &= delta_{il},delta_{jk} cr
          }$$

          We can use these to rearrange that last differential
          $$eqalign{
          ({mathcal H}F + F{mathcal K}):dF
          &= (Q{mathcal H}V + V{mathcal H}Q:{mathcal K}):dA cr
          }$$

          Now we need to calculate a 4th order tensor ${mathcal P}$ which is the inverse (under the double-contraction product) of the tensor on the LHS, i.e.
          $$eqalign{
          {mathcal H}&= {mathcal P}:big({mathcal H}F + F{mathcal K}big)cr
          }$$

          This allow us to isolate $dF$ and calculate the gradient as
          $$eqalign{
          dF
          &= Big({mathcal P}:(Q{mathcal H}V + V{mathcal H}Q:{mathcal K})Big):dA cr
          frac{partial F}{partial A}
          &= {mathcal P}:(Q{mathcal H}V + V{mathcal H}Q:{mathcal K}) cr
          }$$

          In the above, juxtaposition indicates a single-contraction product, while a colon represents double-contraction. Here are some examples in component form
          $$eqalign{
          {mathcal A} &= {mathcal B}:{mathcal C} &implies
          {mathcal A}_{ijmn} &= sum_{kl}{mathcal B}_{ijkl},{mathcal C}_{klmn}
          cr
          {alpha} &= {B}:{C} &implies
          {alpha} &= sum_{ij}{B}_{ij},{C}_{ij}
          cr
          {A} &= BC &implies
          {A}_{ik} &= sum_{j}{B}_{ij},{C}_{jk}
          cr
          {mathcal A} &= B,{mathcal C} &implies
          {mathcal A}_{iklm} &= sum_{j}{B}_{ij},{mathcal C}_{jklm}
          cr
          }$$






          share|cite|improve this answer











          $endgroup$













          • $begingroup$
            Thanks, but I am not sure I understand some things: As I defined it, the matrix $A$ is invertible, and so $A^+=A^{-1}$. (In particular $Q=0$ ). Also, $FF^T=Id$, since $F$ is orthogonal; Do you gain anything from writing $FF^T=AA^+$?
            $endgroup$
            – Asaf Shachar
            Jan 18 at 5:58












          • $begingroup$
            Also, differentiating $d(FF^T) = d(AA^+) $ should result in $dF,F^T+F,dF^T =dA,A^+ + A,dA^{+}$, while you wrote $dF,F^T+F,dF^T = Q,dA,A^+ + V,dA^T,Q $...
            $endgroup$
            – Asaf Shachar
            Jan 18 at 6:02










          • $begingroup$
            If $A^+=A^{-1}$ then $Q=0$. In which case you need to find the tensor equivalent of the nullspace projector to solve the problem. Let ${mathcal M}=({mathcal H}F+F{mathcal K})$ and let's re-purpose ${mathcal P}$ to be the pseudoinverse rather than the regular inverse. So the gradient is $$eqalign{{mathcal M}:{mathcal P}:{mathcal M}&={mathcal M}quad&({rm defines,pseudoinverse})cr {mathcal M}:dF &= 0quad&({rm dF,lies,in,the,nullspace})cr dF &= ({mathcal H}-{mathcal P}:{mathcal M}):dA cr frac{partial F}{partial A} &= ({mathcal H}-{mathcal P}:{mathcal M}) cr }$$
            $endgroup$
            – lynn
            Jan 19 at 22:26











          Your Answer





          StackExchange.ifUsing("editor", function () {
          return StackExchange.using("mathjaxEditing", function () {
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
          });
          });
          }, "mathjax-editing");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "69"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          noCode: true, onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3076821%2fis-there-a-closed-form-formula-for-the-derivative-of-the-orthogonal-polar-factor%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          0












          $begingroup$

          Define the matrices
          $$eqalign{
          F &= A(A^TA)^{-1/2} cr
          Q &= I-AA^+ = ,Q^T cr
          V^T &= A^+ cr
          }$$

          where $A^+$ denotes the pseudoinverse.



          Then
          $$eqalign{
          FF^T &= A(A^TA)^{-1}A^T cr&= AA^+ cr
          d(FF^T) &= d(AA^+) cr
          dF,F^T+F,dF^T &= Q,dA,A^+ + V,dA^T,Q cr
          }$$

          At this point, you have stressed that you don't want to use vectorization, so let's try solving it using 4th order tensors.



          There are 3 isotropic tensors $({mathcal H},{mathcal J},{mathcal K})$ and their components can be expressed in terms of Kronecker symbols.
          $$eqalign{
          {mathcal H}_{ijkl} &= delta_{ik},delta_{jl} cr
          {mathcal J}_{ijkl} &= delta_{ij},delta_{kl} cr
          {mathcal K}_{ijkl} &= delta_{il},delta_{jk} cr
          }$$

          We can use these to rearrange that last differential
          $$eqalign{
          ({mathcal H}F + F{mathcal K}):dF
          &= (Q{mathcal H}V + V{mathcal H}Q:{mathcal K}):dA cr
          }$$

          Now we need to calculate a 4th order tensor ${mathcal P}$ which is the inverse (under the double-contraction product) of the tensor on the LHS, i.e.
          $$eqalign{
          {mathcal H}&= {mathcal P}:big({mathcal H}F + F{mathcal K}big)cr
          }$$

          This allow us to isolate $dF$ and calculate the gradient as
          $$eqalign{
          dF
          &= Big({mathcal P}:(Q{mathcal H}V + V{mathcal H}Q:{mathcal K})Big):dA cr
          frac{partial F}{partial A}
          &= {mathcal P}:(Q{mathcal H}V + V{mathcal H}Q:{mathcal K}) cr
          }$$

          In the above, juxtaposition indicates a single-contraction product, while a colon represents double-contraction. Here are some examples in component form
          $$eqalign{
          {mathcal A} &= {mathcal B}:{mathcal C} &implies
          {mathcal A}_{ijmn} &= sum_{kl}{mathcal B}_{ijkl},{mathcal C}_{klmn}
          cr
          {alpha} &= {B}:{C} &implies
          {alpha} &= sum_{ij}{B}_{ij},{C}_{ij}
          cr
          {A} &= BC &implies
          {A}_{ik} &= sum_{j}{B}_{ij},{C}_{jk}
          cr
          {mathcal A} &= B,{mathcal C} &implies
          {mathcal A}_{iklm} &= sum_{j}{B}_{ij},{mathcal C}_{jklm}
          cr
          }$$






          share|cite|improve this answer











          $endgroup$













          • $begingroup$
            Thanks, but I am not sure I understand some things: As I defined it, the matrix $A$ is invertible, and so $A^+=A^{-1}$. (In particular $Q=0$ ). Also, $FF^T=Id$, since $F$ is orthogonal; Do you gain anything from writing $FF^T=AA^+$?
            $endgroup$
            – Asaf Shachar
            Jan 18 at 5:58












          • $begingroup$
            Also, differentiating $d(FF^T) = d(AA^+) $ should result in $dF,F^T+F,dF^T =dA,A^+ + A,dA^{+}$, while you wrote $dF,F^T+F,dF^T = Q,dA,A^+ + V,dA^T,Q $...
            $endgroup$
            – Asaf Shachar
            Jan 18 at 6:02










          • $begingroup$
            If $A^+=A^{-1}$ then $Q=0$. In which case you need to find the tensor equivalent of the nullspace projector to solve the problem. Let ${mathcal M}=({mathcal H}F+F{mathcal K})$ and let's re-purpose ${mathcal P}$ to be the pseudoinverse rather than the regular inverse. So the gradient is $$eqalign{{mathcal M}:{mathcal P}:{mathcal M}&={mathcal M}quad&({rm defines,pseudoinverse})cr {mathcal M}:dF &= 0quad&({rm dF,lies,in,the,nullspace})cr dF &= ({mathcal H}-{mathcal P}:{mathcal M}):dA cr frac{partial F}{partial A} &= ({mathcal H}-{mathcal P}:{mathcal M}) cr }$$
            $endgroup$
            – lynn
            Jan 19 at 22:26
















          0












          $begingroup$

          Define the matrices
          $$eqalign{
          F &= A(A^TA)^{-1/2} cr
          Q &= I-AA^+ = ,Q^T cr
          V^T &= A^+ cr
          }$$

          where $A^+$ denotes the pseudoinverse.



          Then
          $$eqalign{
          FF^T &= A(A^TA)^{-1}A^T cr&= AA^+ cr
          d(FF^T) &= d(AA^+) cr
          dF,F^T+F,dF^T &= Q,dA,A^+ + V,dA^T,Q cr
          }$$

          At this point, you have stressed that you don't want to use vectorization, so let's try solving it using 4th order tensors.



          There are 3 isotropic tensors $({mathcal H},{mathcal J},{mathcal K})$ and their components can be expressed in terms of Kronecker symbols.
          $$eqalign{
          {mathcal H}_{ijkl} &= delta_{ik},delta_{jl} cr
          {mathcal J}_{ijkl} &= delta_{ij},delta_{kl} cr
          {mathcal K}_{ijkl} &= delta_{il},delta_{jk} cr
          }$$

          We can use these to rearrange that last differential
          $$eqalign{
          ({mathcal H}F + F{mathcal K}):dF
          &= (Q{mathcal H}V + V{mathcal H}Q:{mathcal K}):dA cr
          }$$

          Now we need to calculate a 4th order tensor ${mathcal P}$ which is the inverse (under the double-contraction product) of the tensor on the LHS, i.e.
          $$eqalign{
          {mathcal H}&= {mathcal P}:big({mathcal H}F + F{mathcal K}big)cr
          }$$

          This allow us to isolate $dF$ and calculate the gradient as
          $$eqalign{
          dF
          &= Big({mathcal P}:(Q{mathcal H}V + V{mathcal H}Q:{mathcal K})Big):dA cr
          frac{partial F}{partial A}
          &= {mathcal P}:(Q{mathcal H}V + V{mathcal H}Q:{mathcal K}) cr
          }$$

          In the above, juxtaposition indicates a single-contraction product, while a colon represents double-contraction. Here are some examples in component form
          $$eqalign{
          {mathcal A} &= {mathcal B}:{mathcal C} &implies
          {mathcal A}_{ijmn} &= sum_{kl}{mathcal B}_{ijkl},{mathcal C}_{klmn}
          cr
          {alpha} &= {B}:{C} &implies
          {alpha} &= sum_{ij}{B}_{ij},{C}_{ij}
          cr
          {A} &= BC &implies
          {A}_{ik} &= sum_{j}{B}_{ij},{C}_{jk}
          cr
          {mathcal A} &= B,{mathcal C} &implies
          {mathcal A}_{iklm} &= sum_{j}{B}_{ij},{mathcal C}_{jklm}
          cr
          }$$






          share|cite|improve this answer











          $endgroup$













          • $begingroup$
            Thanks, but I am not sure I understand some things: As I defined it, the matrix $A$ is invertible, and so $A^+=A^{-1}$. (In particular $Q=0$ ). Also, $FF^T=Id$, since $F$ is orthogonal; Do you gain anything from writing $FF^T=AA^+$?
            $endgroup$
            – Asaf Shachar
            Jan 18 at 5:58












          • $begingroup$
            Also, differentiating $d(FF^T) = d(AA^+) $ should result in $dF,F^T+F,dF^T =dA,A^+ + A,dA^{+}$, while you wrote $dF,F^T+F,dF^T = Q,dA,A^+ + V,dA^T,Q $...
            $endgroup$
            – Asaf Shachar
            Jan 18 at 6:02










          • $begingroup$
            If $A^+=A^{-1}$ then $Q=0$. In which case you need to find the tensor equivalent of the nullspace projector to solve the problem. Let ${mathcal M}=({mathcal H}F+F{mathcal K})$ and let's re-purpose ${mathcal P}$ to be the pseudoinverse rather than the regular inverse. So the gradient is $$eqalign{{mathcal M}:{mathcal P}:{mathcal M}&={mathcal M}quad&({rm defines,pseudoinverse})cr {mathcal M}:dF &= 0quad&({rm dF,lies,in,the,nullspace})cr dF &= ({mathcal H}-{mathcal P}:{mathcal M}):dA cr frac{partial F}{partial A} &= ({mathcal H}-{mathcal P}:{mathcal M}) cr }$$
            $endgroup$
            – lynn
            Jan 19 at 22:26














          0












          0








          0





          $begingroup$

          Define the matrices
          $$eqalign{
          F &= A(A^TA)^{-1/2} cr
          Q &= I-AA^+ = ,Q^T cr
          V^T &= A^+ cr
          }$$

          where $A^+$ denotes the pseudoinverse.



          Then
          $$eqalign{
          FF^T &= A(A^TA)^{-1}A^T cr&= AA^+ cr
          d(FF^T) &= d(AA^+) cr
          dF,F^T+F,dF^T &= Q,dA,A^+ + V,dA^T,Q cr
          }$$

          At this point, you have stressed that you don't want to use vectorization, so let's try solving it using 4th order tensors.



          There are 3 isotropic tensors $({mathcal H},{mathcal J},{mathcal K})$ and their components can be expressed in terms of Kronecker symbols.
          $$eqalign{
          {mathcal H}_{ijkl} &= delta_{ik},delta_{jl} cr
          {mathcal J}_{ijkl} &= delta_{ij},delta_{kl} cr
          {mathcal K}_{ijkl} &= delta_{il},delta_{jk} cr
          }$$

          We can use these to rearrange that last differential
          $$eqalign{
          ({mathcal H}F + F{mathcal K}):dF
          &= (Q{mathcal H}V + V{mathcal H}Q:{mathcal K}):dA cr
          }$$

          Now we need to calculate a 4th order tensor ${mathcal P}$ which is the inverse (under the double-contraction product) of the tensor on the LHS, i.e.
          $$eqalign{
          {mathcal H}&= {mathcal P}:big({mathcal H}F + F{mathcal K}big)cr
          }$$

          This allow us to isolate $dF$ and calculate the gradient as
          $$eqalign{
          dF
          &= Big({mathcal P}:(Q{mathcal H}V + V{mathcal H}Q:{mathcal K})Big):dA cr
          frac{partial F}{partial A}
          &= {mathcal P}:(Q{mathcal H}V + V{mathcal H}Q:{mathcal K}) cr
          }$$

          In the above, juxtaposition indicates a single-contraction product, while a colon represents double-contraction. Here are some examples in component form
          $$eqalign{
          {mathcal A} &= {mathcal B}:{mathcal C} &implies
          {mathcal A}_{ijmn} &= sum_{kl}{mathcal B}_{ijkl},{mathcal C}_{klmn}
          cr
          {alpha} &= {B}:{C} &implies
          {alpha} &= sum_{ij}{B}_{ij},{C}_{ij}
          cr
          {A} &= BC &implies
          {A}_{ik} &= sum_{j}{B}_{ij},{C}_{jk}
          cr
          {mathcal A} &= B,{mathcal C} &implies
          {mathcal A}_{iklm} &= sum_{j}{B}_{ij},{mathcal C}_{jklm}
          cr
          }$$






          share|cite|improve this answer











          $endgroup$



          Define the matrices
          $$eqalign{
          F &= A(A^TA)^{-1/2} cr
          Q &= I-AA^+ = ,Q^T cr
          V^T &= A^+ cr
          }$$

          where $A^+$ denotes the pseudoinverse.



          Then
          $$eqalign{
          FF^T &= A(A^TA)^{-1}A^T cr&= AA^+ cr
          d(FF^T) &= d(AA^+) cr
          dF,F^T+F,dF^T &= Q,dA,A^+ + V,dA^T,Q cr
          }$$

          At this point, you have stressed that you don't want to use vectorization, so let's try solving it using 4th order tensors.



          There are 3 isotropic tensors $({mathcal H},{mathcal J},{mathcal K})$ and their components can be expressed in terms of Kronecker symbols.
          $$eqalign{
          {mathcal H}_{ijkl} &= delta_{ik},delta_{jl} cr
          {mathcal J}_{ijkl} &= delta_{ij},delta_{kl} cr
          {mathcal K}_{ijkl} &= delta_{il},delta_{jk} cr
          }$$

          We can use these to rearrange that last differential
          $$eqalign{
          ({mathcal H}F + F{mathcal K}):dF
          &= (Q{mathcal H}V + V{mathcal H}Q:{mathcal K}):dA cr
          }$$

          Now we need to calculate a 4th order tensor ${mathcal P}$ which is the inverse (under the double-contraction product) of the tensor on the LHS, i.e.
          $$eqalign{
          {mathcal H}&= {mathcal P}:big({mathcal H}F + F{mathcal K}big)cr
          }$$

          This allow us to isolate $dF$ and calculate the gradient as
          $$eqalign{
          dF
          &= Big({mathcal P}:(Q{mathcal H}V + V{mathcal H}Q:{mathcal K})Big):dA cr
          frac{partial F}{partial A}
          &= {mathcal P}:(Q{mathcal H}V + V{mathcal H}Q:{mathcal K}) cr
          }$$

          In the above, juxtaposition indicates a single-contraction product, while a colon represents double-contraction. Here are some examples in component form
          $$eqalign{
          {mathcal A} &= {mathcal B}:{mathcal C} &implies
          {mathcal A}_{ijmn} &= sum_{kl}{mathcal B}_{ijkl},{mathcal C}_{klmn}
          cr
          {alpha} &= {B}:{C} &implies
          {alpha} &= sum_{ij}{B}_{ij},{C}_{ij}
          cr
          {A} &= BC &implies
          {A}_{ik} &= sum_{j}{B}_{ij},{C}_{jk}
          cr
          {mathcal A} &= B,{mathcal C} &implies
          {mathcal A}_{iklm} &= sum_{j}{B}_{ij},{mathcal C}_{jklm}
          cr
          }$$







          share|cite|improve this answer














          share|cite|improve this answer



          share|cite|improve this answer








          edited Jan 17 at 22:10

























          answered Jan 17 at 21:47









          lynnlynn

          2,001177




          2,001177












          • $begingroup$
            Thanks, but I am not sure I understand some things: As I defined it, the matrix $A$ is invertible, and so $A^+=A^{-1}$. (In particular $Q=0$ ). Also, $FF^T=Id$, since $F$ is orthogonal; Do you gain anything from writing $FF^T=AA^+$?
            $endgroup$
            – Asaf Shachar
            Jan 18 at 5:58












          • $begingroup$
            Also, differentiating $d(FF^T) = d(AA^+) $ should result in $dF,F^T+F,dF^T =dA,A^+ + A,dA^{+}$, while you wrote $dF,F^T+F,dF^T = Q,dA,A^+ + V,dA^T,Q $...
            $endgroup$
            – Asaf Shachar
            Jan 18 at 6:02










          • $begingroup$
            If $A^+=A^{-1}$ then $Q=0$. In which case you need to find the tensor equivalent of the nullspace projector to solve the problem. Let ${mathcal M}=({mathcal H}F+F{mathcal K})$ and let's re-purpose ${mathcal P}$ to be the pseudoinverse rather than the regular inverse. So the gradient is $$eqalign{{mathcal M}:{mathcal P}:{mathcal M}&={mathcal M}quad&({rm defines,pseudoinverse})cr {mathcal M}:dF &= 0quad&({rm dF,lies,in,the,nullspace})cr dF &= ({mathcal H}-{mathcal P}:{mathcal M}):dA cr frac{partial F}{partial A} &= ({mathcal H}-{mathcal P}:{mathcal M}) cr }$$
            $endgroup$
            – lynn
            Jan 19 at 22:26


















          • $begingroup$
            Thanks, but I am not sure I understand some things: As I defined it, the matrix $A$ is invertible, and so $A^+=A^{-1}$. (In particular $Q=0$ ). Also, $FF^T=Id$, since $F$ is orthogonal; Do you gain anything from writing $FF^T=AA^+$?
            $endgroup$
            – Asaf Shachar
            Jan 18 at 5:58












          • $begingroup$
            Also, differentiating $d(FF^T) = d(AA^+) $ should result in $dF,F^T+F,dF^T =dA,A^+ + A,dA^{+}$, while you wrote $dF,F^T+F,dF^T = Q,dA,A^+ + V,dA^T,Q $...
            $endgroup$
            – Asaf Shachar
            Jan 18 at 6:02










          • $begingroup$
            If $A^+=A^{-1}$ then $Q=0$. In which case you need to find the tensor equivalent of the nullspace projector to solve the problem. Let ${mathcal M}=({mathcal H}F+F{mathcal K})$ and let's re-purpose ${mathcal P}$ to be the pseudoinverse rather than the regular inverse. So the gradient is $$eqalign{{mathcal M}:{mathcal P}:{mathcal M}&={mathcal M}quad&({rm defines,pseudoinverse})cr {mathcal M}:dF &= 0quad&({rm dF,lies,in,the,nullspace})cr dF &= ({mathcal H}-{mathcal P}:{mathcal M}):dA cr frac{partial F}{partial A} &= ({mathcal H}-{mathcal P}:{mathcal M}) cr }$$
            $endgroup$
            – lynn
            Jan 19 at 22:26
















          $begingroup$
          Thanks, but I am not sure I understand some things: As I defined it, the matrix $A$ is invertible, and so $A^+=A^{-1}$. (In particular $Q=0$ ). Also, $FF^T=Id$, since $F$ is orthogonal; Do you gain anything from writing $FF^T=AA^+$?
          $endgroup$
          – Asaf Shachar
          Jan 18 at 5:58






          $begingroup$
          Thanks, but I am not sure I understand some things: As I defined it, the matrix $A$ is invertible, and so $A^+=A^{-1}$. (In particular $Q=0$ ). Also, $FF^T=Id$, since $F$ is orthogonal; Do you gain anything from writing $FF^T=AA^+$?
          $endgroup$
          – Asaf Shachar
          Jan 18 at 5:58














          $begingroup$
          Also, differentiating $d(FF^T) = d(AA^+) $ should result in $dF,F^T+F,dF^T =dA,A^+ + A,dA^{+}$, while you wrote $dF,F^T+F,dF^T = Q,dA,A^+ + V,dA^T,Q $...
          $endgroup$
          – Asaf Shachar
          Jan 18 at 6:02




          $begingroup$
          Also, differentiating $d(FF^T) = d(AA^+) $ should result in $dF,F^T+F,dF^T =dA,A^+ + A,dA^{+}$, while you wrote $dF,F^T+F,dF^T = Q,dA,A^+ + V,dA^T,Q $...
          $endgroup$
          – Asaf Shachar
          Jan 18 at 6:02












          $begingroup$
          If $A^+=A^{-1}$ then $Q=0$. In which case you need to find the tensor equivalent of the nullspace projector to solve the problem. Let ${mathcal M}=({mathcal H}F+F{mathcal K})$ and let's re-purpose ${mathcal P}$ to be the pseudoinverse rather than the regular inverse. So the gradient is $$eqalign{{mathcal M}:{mathcal P}:{mathcal M}&={mathcal M}quad&({rm defines,pseudoinverse})cr {mathcal M}:dF &= 0quad&({rm dF,lies,in,the,nullspace})cr dF &= ({mathcal H}-{mathcal P}:{mathcal M}):dA cr frac{partial F}{partial A} &= ({mathcal H}-{mathcal P}:{mathcal M}) cr }$$
          $endgroup$
          – lynn
          Jan 19 at 22:26




          $begingroup$
          If $A^+=A^{-1}$ then $Q=0$. In which case you need to find the tensor equivalent of the nullspace projector to solve the problem. Let ${mathcal M}=({mathcal H}F+F{mathcal K})$ and let's re-purpose ${mathcal P}$ to be the pseudoinverse rather than the regular inverse. So the gradient is $$eqalign{{mathcal M}:{mathcal P}:{mathcal M}&={mathcal M}quad&({rm defines,pseudoinverse})cr {mathcal M}:dF &= 0quad&({rm dF,lies,in,the,nullspace})cr dF &= ({mathcal H}-{mathcal P}:{mathcal M}):dA cr frac{partial F}{partial A} &= ({mathcal H}-{mathcal P}:{mathcal M}) cr }$$
          $endgroup$
          – lynn
          Jan 19 at 22:26


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Mathematics Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3076821%2fis-there-a-closed-form-formula-for-the-derivative-of-the-orthogonal-polar-factor%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          MongoDB - Not Authorized To Execute Command

          How to fix TextFormField cause rebuild widget in Flutter

          in spring boot 2.1 many test slices are not allowed anymore due to multiple @BootstrapWith