Is there a closed-form formula for the derivative of the orthogonal polar factor of a matrix?
$begingroup$
$newcommand{psym}{text{Psym}_n}$
$newcommand{sym}{text{sym}}$
$newcommand{Sym}{operatorname{Sym}}$
$newcommand{Skew}{operatorname{Skew}}$
$newcommand{SO}{operatorname{SO}_n}$
$renewcommand{skew}{operatorname{skew}}$
$newcommand{GLp}{operatorname{GL}_n^+}$
Let $psym$ be the space of real symmetric positive-definite $n times n$ matrices, and $GLp$ be the group of real $n times n$ invertible matrices with positive determinant.
Let $O:GLp to SO$ be the orthogonal polar factor map, defined by requiring $A=
O(A)P$ for some symmetric positive-definite $P$. Note that $O(A)=A(sqrt{A^TA})^{-1}$.
Is there a nice "closed-form algebraic formula" for the differential $dO_A$? If not, perhaps there is a formula for $langle dO_A(B),C rangle $? (similarly to what happens with the Levi-Civita connection, where we have an implicit characterization of $nabla_XY$ in terms of the Koszul formula).
I am fine with using positive matrix square roots and inverses, but not with using integral formulas or vectorization operations like here or here. (I also don't want to use explicitly the singular values of $A$).
Here are some partial results:
Since $dO_{QA}(QB)=QdO_A(B)$ (and $dO_{AQ}(BQ)=dO_A(B)Q$), the question can be reduced to the case where $A in psym$. (The "dual" orthogonal case is easy: $dO_{Id}(B)=skew(B)$, and for every $Q in SO$, $dO_{Q}(QB)=Qskew B$).
For every $B in skew, dO_A(OBP)=OB$, i.e. $dO_A(X)=XP^{-1}$ if $O^TXP^{-1} in skew$: Set $alpha(t)=Oe^{tB}P$. Then $O(alpha(t))=Oe^{tB}$, so $dO_A(OBP)=OB$.
The problem with calculating $dO_A(OBP)$ when $B in sym$, is that $e^{tB}P$ does not need to be symmetric, even though $e^{tB},P$ are both positive-definite. If this was the case, then it would imply $dO_A(OBP)=Oskew B$, which is false in general (see below).
One could conjecture that perhaps $dO_A(OBP)=OB$ holds also for $B in sym$, or equivalently that $dO_A(X)=XP^{-1}$ for every $X in M_n$. This is false since $dO_A$ is not injective, due to dimensional incompatibility.
Another possible conjecture is $dO_A(OBP)=Oskew B$. However, this is also false:
Indeed, for $A=P in psym$, this reduces to $dO_P(BP)=skew B$. Suppose that $C:=BP in sym$. Then must have $dO_P(BP)=0$. Indeed, by differentiating $A=OP$ we obtain
$$ dot A=dot O P+Odot P, tag{1}$$
which for $A=P,O=Id$ becomes $ dot A=dot O P+dot P$. Note that $C in sym Rightarrow dP_P(C)=C$ (since $P_{psym}=Id_{psym}$ and $C in T_P{psym}$), i.e. $dot P=dot A$, which implies $dot O=0$.
Thus, we proved that $BP in sym$ implies $dO_P(BP)=0$. However, this is incompatible with $dO_P(BP)=skew B$ in general: $BP in sym iff BP=PB^T$.
For $P=text{diag}(sigma_1,sigma_2), B=begin{pmatrix} 1 & b \ c & 1 end{pmatrix}$, this happens if and only if $sigma_2b=sigma_1c$, so if $sigma_1 neq sigma_2$, then $B$ is not symmetric. Thus, $dO_P(BP)=0 neq skew B$.
Comment: Equation $(1)$ implies that one can equivalently focus upon the "dual" problem, of computing $dP_A$, instead of $dO_A$. Here is a previous attempt of mine to go in this direction.
soft-question closed-form matrix-calculus matrix-decomposition orthogonal-matrices
$endgroup$
add a comment |
$begingroup$
$newcommand{psym}{text{Psym}_n}$
$newcommand{sym}{text{sym}}$
$newcommand{Sym}{operatorname{Sym}}$
$newcommand{Skew}{operatorname{Skew}}$
$newcommand{SO}{operatorname{SO}_n}$
$renewcommand{skew}{operatorname{skew}}$
$newcommand{GLp}{operatorname{GL}_n^+}$
Let $psym$ be the space of real symmetric positive-definite $n times n$ matrices, and $GLp$ be the group of real $n times n$ invertible matrices with positive determinant.
Let $O:GLp to SO$ be the orthogonal polar factor map, defined by requiring $A=
O(A)P$ for some symmetric positive-definite $P$. Note that $O(A)=A(sqrt{A^TA})^{-1}$.
Is there a nice "closed-form algebraic formula" for the differential $dO_A$? If not, perhaps there is a formula for $langle dO_A(B),C rangle $? (similarly to what happens with the Levi-Civita connection, where we have an implicit characterization of $nabla_XY$ in terms of the Koszul formula).
I am fine with using positive matrix square roots and inverses, but not with using integral formulas or vectorization operations like here or here. (I also don't want to use explicitly the singular values of $A$).
Here are some partial results:
Since $dO_{QA}(QB)=QdO_A(B)$ (and $dO_{AQ}(BQ)=dO_A(B)Q$), the question can be reduced to the case where $A in psym$. (The "dual" orthogonal case is easy: $dO_{Id}(B)=skew(B)$, and for every $Q in SO$, $dO_{Q}(QB)=Qskew B$).
For every $B in skew, dO_A(OBP)=OB$, i.e. $dO_A(X)=XP^{-1}$ if $O^TXP^{-1} in skew$: Set $alpha(t)=Oe^{tB}P$. Then $O(alpha(t))=Oe^{tB}$, so $dO_A(OBP)=OB$.
The problem with calculating $dO_A(OBP)$ when $B in sym$, is that $e^{tB}P$ does not need to be symmetric, even though $e^{tB},P$ are both positive-definite. If this was the case, then it would imply $dO_A(OBP)=Oskew B$, which is false in general (see below).
One could conjecture that perhaps $dO_A(OBP)=OB$ holds also for $B in sym$, or equivalently that $dO_A(X)=XP^{-1}$ for every $X in M_n$. This is false since $dO_A$ is not injective, due to dimensional incompatibility.
Another possible conjecture is $dO_A(OBP)=Oskew B$. However, this is also false:
Indeed, for $A=P in psym$, this reduces to $dO_P(BP)=skew B$. Suppose that $C:=BP in sym$. Then must have $dO_P(BP)=0$. Indeed, by differentiating $A=OP$ we obtain
$$ dot A=dot O P+Odot P, tag{1}$$
which for $A=P,O=Id$ becomes $ dot A=dot O P+dot P$. Note that $C in sym Rightarrow dP_P(C)=C$ (since $P_{psym}=Id_{psym}$ and $C in T_P{psym}$), i.e. $dot P=dot A$, which implies $dot O=0$.
Thus, we proved that $BP in sym$ implies $dO_P(BP)=0$. However, this is incompatible with $dO_P(BP)=skew B$ in general: $BP in sym iff BP=PB^T$.
For $P=text{diag}(sigma_1,sigma_2), B=begin{pmatrix} 1 & b \ c & 1 end{pmatrix}$, this happens if and only if $sigma_2b=sigma_1c$, so if $sigma_1 neq sigma_2$, then $B$ is not symmetric. Thus, $dO_P(BP)=0 neq skew B$.
Comment: Equation $(1)$ implies that one can equivalently focus upon the "dual" problem, of computing $dP_A$, instead of $dO_A$. Here is a previous attempt of mine to go in this direction.
soft-question closed-form matrix-calculus matrix-decomposition orthogonal-matrices
$endgroup$
add a comment |
$begingroup$
$newcommand{psym}{text{Psym}_n}$
$newcommand{sym}{text{sym}}$
$newcommand{Sym}{operatorname{Sym}}$
$newcommand{Skew}{operatorname{Skew}}$
$newcommand{SO}{operatorname{SO}_n}$
$renewcommand{skew}{operatorname{skew}}$
$newcommand{GLp}{operatorname{GL}_n^+}$
Let $psym$ be the space of real symmetric positive-definite $n times n$ matrices, and $GLp$ be the group of real $n times n$ invertible matrices with positive determinant.
Let $O:GLp to SO$ be the orthogonal polar factor map, defined by requiring $A=
O(A)P$ for some symmetric positive-definite $P$. Note that $O(A)=A(sqrt{A^TA})^{-1}$.
Is there a nice "closed-form algebraic formula" for the differential $dO_A$? If not, perhaps there is a formula for $langle dO_A(B),C rangle $? (similarly to what happens with the Levi-Civita connection, where we have an implicit characterization of $nabla_XY$ in terms of the Koszul formula).
I am fine with using positive matrix square roots and inverses, but not with using integral formulas or vectorization operations like here or here. (I also don't want to use explicitly the singular values of $A$).
Here are some partial results:
Since $dO_{QA}(QB)=QdO_A(B)$ (and $dO_{AQ}(BQ)=dO_A(B)Q$), the question can be reduced to the case where $A in psym$. (The "dual" orthogonal case is easy: $dO_{Id}(B)=skew(B)$, and for every $Q in SO$, $dO_{Q}(QB)=Qskew B$).
For every $B in skew, dO_A(OBP)=OB$, i.e. $dO_A(X)=XP^{-1}$ if $O^TXP^{-1} in skew$: Set $alpha(t)=Oe^{tB}P$. Then $O(alpha(t))=Oe^{tB}$, so $dO_A(OBP)=OB$.
The problem with calculating $dO_A(OBP)$ when $B in sym$, is that $e^{tB}P$ does not need to be symmetric, even though $e^{tB},P$ are both positive-definite. If this was the case, then it would imply $dO_A(OBP)=Oskew B$, which is false in general (see below).
One could conjecture that perhaps $dO_A(OBP)=OB$ holds also for $B in sym$, or equivalently that $dO_A(X)=XP^{-1}$ for every $X in M_n$. This is false since $dO_A$ is not injective, due to dimensional incompatibility.
Another possible conjecture is $dO_A(OBP)=Oskew B$. However, this is also false:
Indeed, for $A=P in psym$, this reduces to $dO_P(BP)=skew B$. Suppose that $C:=BP in sym$. Then must have $dO_P(BP)=0$. Indeed, by differentiating $A=OP$ we obtain
$$ dot A=dot O P+Odot P, tag{1}$$
which for $A=P,O=Id$ becomes $ dot A=dot O P+dot P$. Note that $C in sym Rightarrow dP_P(C)=C$ (since $P_{psym}=Id_{psym}$ and $C in T_P{psym}$), i.e. $dot P=dot A$, which implies $dot O=0$.
Thus, we proved that $BP in sym$ implies $dO_P(BP)=0$. However, this is incompatible with $dO_P(BP)=skew B$ in general: $BP in sym iff BP=PB^T$.
For $P=text{diag}(sigma_1,sigma_2), B=begin{pmatrix} 1 & b \ c & 1 end{pmatrix}$, this happens if and only if $sigma_2b=sigma_1c$, so if $sigma_1 neq sigma_2$, then $B$ is not symmetric. Thus, $dO_P(BP)=0 neq skew B$.
Comment: Equation $(1)$ implies that one can equivalently focus upon the "dual" problem, of computing $dP_A$, instead of $dO_A$. Here is a previous attempt of mine to go in this direction.
soft-question closed-form matrix-calculus matrix-decomposition orthogonal-matrices
$endgroup$
$newcommand{psym}{text{Psym}_n}$
$newcommand{sym}{text{sym}}$
$newcommand{Sym}{operatorname{Sym}}$
$newcommand{Skew}{operatorname{Skew}}$
$newcommand{SO}{operatorname{SO}_n}$
$renewcommand{skew}{operatorname{skew}}$
$newcommand{GLp}{operatorname{GL}_n^+}$
Let $psym$ be the space of real symmetric positive-definite $n times n$ matrices, and $GLp$ be the group of real $n times n$ invertible matrices with positive determinant.
Let $O:GLp to SO$ be the orthogonal polar factor map, defined by requiring $A=
O(A)P$ for some symmetric positive-definite $P$. Note that $O(A)=A(sqrt{A^TA})^{-1}$.
Is there a nice "closed-form algebraic formula" for the differential $dO_A$? If not, perhaps there is a formula for $langle dO_A(B),C rangle $? (similarly to what happens with the Levi-Civita connection, where we have an implicit characterization of $nabla_XY$ in terms of the Koszul formula).
I am fine with using positive matrix square roots and inverses, but not with using integral formulas or vectorization operations like here or here. (I also don't want to use explicitly the singular values of $A$).
Here are some partial results:
Since $dO_{QA}(QB)=QdO_A(B)$ (and $dO_{AQ}(BQ)=dO_A(B)Q$), the question can be reduced to the case where $A in psym$. (The "dual" orthogonal case is easy: $dO_{Id}(B)=skew(B)$, and for every $Q in SO$, $dO_{Q}(QB)=Qskew B$).
For every $B in skew, dO_A(OBP)=OB$, i.e. $dO_A(X)=XP^{-1}$ if $O^TXP^{-1} in skew$: Set $alpha(t)=Oe^{tB}P$. Then $O(alpha(t))=Oe^{tB}$, so $dO_A(OBP)=OB$.
The problem with calculating $dO_A(OBP)$ when $B in sym$, is that $e^{tB}P$ does not need to be symmetric, even though $e^{tB},P$ are both positive-definite. If this was the case, then it would imply $dO_A(OBP)=Oskew B$, which is false in general (see below).
One could conjecture that perhaps $dO_A(OBP)=OB$ holds also for $B in sym$, or equivalently that $dO_A(X)=XP^{-1}$ for every $X in M_n$. This is false since $dO_A$ is not injective, due to dimensional incompatibility.
Another possible conjecture is $dO_A(OBP)=Oskew B$. However, this is also false:
Indeed, for $A=P in psym$, this reduces to $dO_P(BP)=skew B$. Suppose that $C:=BP in sym$. Then must have $dO_P(BP)=0$. Indeed, by differentiating $A=OP$ we obtain
$$ dot A=dot O P+Odot P, tag{1}$$
which for $A=P,O=Id$ becomes $ dot A=dot O P+dot P$. Note that $C in sym Rightarrow dP_P(C)=C$ (since $P_{psym}=Id_{psym}$ and $C in T_P{psym}$), i.e. $dot P=dot A$, which implies $dot O=0$.
Thus, we proved that $BP in sym$ implies $dO_P(BP)=0$. However, this is incompatible with $dO_P(BP)=skew B$ in general: $BP in sym iff BP=PB^T$.
For $P=text{diag}(sigma_1,sigma_2), B=begin{pmatrix} 1 & b \ c & 1 end{pmatrix}$, this happens if and only if $sigma_2b=sigma_1c$, so if $sigma_1 neq sigma_2$, then $B$ is not symmetric. Thus, $dO_P(BP)=0 neq skew B$.
Comment: Equation $(1)$ implies that one can equivalently focus upon the "dual" problem, of computing $dP_A$, instead of $dO_A$. Here is a previous attempt of mine to go in this direction.
soft-question closed-form matrix-calculus matrix-decomposition orthogonal-matrices
soft-question closed-form matrix-calculus matrix-decomposition orthogonal-matrices
asked Jan 17 at 10:28


Asaf ShacharAsaf Shachar
5,64631141
5,64631141
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
Define the matrices
$$eqalign{
F &= A(A^TA)^{-1/2} cr
Q &= I-AA^+ = ,Q^T cr
V^T &= A^+ cr
}$$
where $A^+$ denotes the pseudoinverse.
Then
$$eqalign{
FF^T &= A(A^TA)^{-1}A^T cr&= AA^+ cr
d(FF^T) &= d(AA^+) cr
dF,F^T+F,dF^T &= Q,dA,A^+ + V,dA^T,Q cr
}$$
At this point, you have stressed that you don't want to use vectorization, so let's try solving it using 4th order tensors.
There are 3 isotropic tensors $({mathcal H},{mathcal J},{mathcal K})$ and their components can be expressed in terms of Kronecker symbols.
$$eqalign{
{mathcal H}_{ijkl} &= delta_{ik},delta_{jl} cr
{mathcal J}_{ijkl} &= delta_{ij},delta_{kl} cr
{mathcal K}_{ijkl} &= delta_{il},delta_{jk} cr
}$$
We can use these to rearrange that last differential
$$eqalign{
({mathcal H}F + F{mathcal K}):dF
&= (Q{mathcal H}V + V{mathcal H}Q:{mathcal K}):dA cr
}$$
Now we need to calculate a 4th order tensor ${mathcal P}$ which is the inverse (under the double-contraction product) of the tensor on the LHS, i.e.
$$eqalign{
{mathcal H}&= {mathcal P}:big({mathcal H}F + F{mathcal K}big)cr
}$$
This allow us to isolate $dF$ and calculate the gradient as
$$eqalign{
dF
&= Big({mathcal P}:(Q{mathcal H}V + V{mathcal H}Q:{mathcal K})Big):dA cr
frac{partial F}{partial A}
&= {mathcal P}:(Q{mathcal H}V + V{mathcal H}Q:{mathcal K}) cr
}$$
In the above, juxtaposition indicates a single-contraction product, while a colon represents double-contraction. Here are some examples in component form
$$eqalign{
{mathcal A} &= {mathcal B}:{mathcal C} &implies
{mathcal A}_{ijmn} &= sum_{kl}{mathcal B}_{ijkl},{mathcal C}_{klmn}
cr
{alpha} &= {B}:{C} &implies
{alpha} &= sum_{ij}{B}_{ij},{C}_{ij}
cr
{A} &= BC &implies
{A}_{ik} &= sum_{j}{B}_{ij},{C}_{jk}
cr
{mathcal A} &= B,{mathcal C} &implies
{mathcal A}_{iklm} &= sum_{j}{B}_{ij},{mathcal C}_{jklm}
cr
}$$
$endgroup$
$begingroup$
Thanks, but I am not sure I understand some things: As I defined it, the matrix $A$ is invertible, and so $A^+=A^{-1}$. (In particular $Q=0$ ). Also, $FF^T=Id$, since $F$ is orthogonal; Do you gain anything from writing $FF^T=AA^+$?
$endgroup$
– Asaf Shachar
Jan 18 at 5:58
$begingroup$
Also, differentiating $d(FF^T) = d(AA^+) $ should result in $dF,F^T+F,dF^T =dA,A^+ + A,dA^{+}$, while you wrote $dF,F^T+F,dF^T = Q,dA,A^+ + V,dA^T,Q $...
$endgroup$
– Asaf Shachar
Jan 18 at 6:02
$begingroup$
If $A^+=A^{-1}$ then $Q=0$. In which case you need to find the tensor equivalent of the nullspace projector to solve the problem. Let ${mathcal M}=({mathcal H}F+F{mathcal K})$ and let's re-purpose ${mathcal P}$ to be the pseudoinverse rather than the regular inverse. So the gradient is $$eqalign{{mathcal M}:{mathcal P}:{mathcal M}&={mathcal M}quad&({rm defines,pseudoinverse})cr {mathcal M}:dF &= 0quad&({rm dF,lies,in,the,nullspace})cr dF &= ({mathcal H}-{mathcal P}:{mathcal M}):dA cr frac{partial F}{partial A} &= ({mathcal H}-{mathcal P}:{mathcal M}) cr }$$
$endgroup$
– lynn
Jan 19 at 22:26
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3076821%2fis-there-a-closed-form-formula-for-the-derivative-of-the-orthogonal-polar-factor%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Define the matrices
$$eqalign{
F &= A(A^TA)^{-1/2} cr
Q &= I-AA^+ = ,Q^T cr
V^T &= A^+ cr
}$$
where $A^+$ denotes the pseudoinverse.
Then
$$eqalign{
FF^T &= A(A^TA)^{-1}A^T cr&= AA^+ cr
d(FF^T) &= d(AA^+) cr
dF,F^T+F,dF^T &= Q,dA,A^+ + V,dA^T,Q cr
}$$
At this point, you have stressed that you don't want to use vectorization, so let's try solving it using 4th order tensors.
There are 3 isotropic tensors $({mathcal H},{mathcal J},{mathcal K})$ and their components can be expressed in terms of Kronecker symbols.
$$eqalign{
{mathcal H}_{ijkl} &= delta_{ik},delta_{jl} cr
{mathcal J}_{ijkl} &= delta_{ij},delta_{kl} cr
{mathcal K}_{ijkl} &= delta_{il},delta_{jk} cr
}$$
We can use these to rearrange that last differential
$$eqalign{
({mathcal H}F + F{mathcal K}):dF
&= (Q{mathcal H}V + V{mathcal H}Q:{mathcal K}):dA cr
}$$
Now we need to calculate a 4th order tensor ${mathcal P}$ which is the inverse (under the double-contraction product) of the tensor on the LHS, i.e.
$$eqalign{
{mathcal H}&= {mathcal P}:big({mathcal H}F + F{mathcal K}big)cr
}$$
This allow us to isolate $dF$ and calculate the gradient as
$$eqalign{
dF
&= Big({mathcal P}:(Q{mathcal H}V + V{mathcal H}Q:{mathcal K})Big):dA cr
frac{partial F}{partial A}
&= {mathcal P}:(Q{mathcal H}V + V{mathcal H}Q:{mathcal K}) cr
}$$
In the above, juxtaposition indicates a single-contraction product, while a colon represents double-contraction. Here are some examples in component form
$$eqalign{
{mathcal A} &= {mathcal B}:{mathcal C} &implies
{mathcal A}_{ijmn} &= sum_{kl}{mathcal B}_{ijkl},{mathcal C}_{klmn}
cr
{alpha} &= {B}:{C} &implies
{alpha} &= sum_{ij}{B}_{ij},{C}_{ij}
cr
{A} &= BC &implies
{A}_{ik} &= sum_{j}{B}_{ij},{C}_{jk}
cr
{mathcal A} &= B,{mathcal C} &implies
{mathcal A}_{iklm} &= sum_{j}{B}_{ij},{mathcal C}_{jklm}
cr
}$$
$endgroup$
$begingroup$
Thanks, but I am not sure I understand some things: As I defined it, the matrix $A$ is invertible, and so $A^+=A^{-1}$. (In particular $Q=0$ ). Also, $FF^T=Id$, since $F$ is orthogonal; Do you gain anything from writing $FF^T=AA^+$?
$endgroup$
– Asaf Shachar
Jan 18 at 5:58
$begingroup$
Also, differentiating $d(FF^T) = d(AA^+) $ should result in $dF,F^T+F,dF^T =dA,A^+ + A,dA^{+}$, while you wrote $dF,F^T+F,dF^T = Q,dA,A^+ + V,dA^T,Q $...
$endgroup$
– Asaf Shachar
Jan 18 at 6:02
$begingroup$
If $A^+=A^{-1}$ then $Q=0$. In which case you need to find the tensor equivalent of the nullspace projector to solve the problem. Let ${mathcal M}=({mathcal H}F+F{mathcal K})$ and let's re-purpose ${mathcal P}$ to be the pseudoinverse rather than the regular inverse. So the gradient is $$eqalign{{mathcal M}:{mathcal P}:{mathcal M}&={mathcal M}quad&({rm defines,pseudoinverse})cr {mathcal M}:dF &= 0quad&({rm dF,lies,in,the,nullspace})cr dF &= ({mathcal H}-{mathcal P}:{mathcal M}):dA cr frac{partial F}{partial A} &= ({mathcal H}-{mathcal P}:{mathcal M}) cr }$$
$endgroup$
– lynn
Jan 19 at 22:26
add a comment |
$begingroup$
Define the matrices
$$eqalign{
F &= A(A^TA)^{-1/2} cr
Q &= I-AA^+ = ,Q^T cr
V^T &= A^+ cr
}$$
where $A^+$ denotes the pseudoinverse.
Then
$$eqalign{
FF^T &= A(A^TA)^{-1}A^T cr&= AA^+ cr
d(FF^T) &= d(AA^+) cr
dF,F^T+F,dF^T &= Q,dA,A^+ + V,dA^T,Q cr
}$$
At this point, you have stressed that you don't want to use vectorization, so let's try solving it using 4th order tensors.
There are 3 isotropic tensors $({mathcal H},{mathcal J},{mathcal K})$ and their components can be expressed in terms of Kronecker symbols.
$$eqalign{
{mathcal H}_{ijkl} &= delta_{ik},delta_{jl} cr
{mathcal J}_{ijkl} &= delta_{ij},delta_{kl} cr
{mathcal K}_{ijkl} &= delta_{il},delta_{jk} cr
}$$
We can use these to rearrange that last differential
$$eqalign{
({mathcal H}F + F{mathcal K}):dF
&= (Q{mathcal H}V + V{mathcal H}Q:{mathcal K}):dA cr
}$$
Now we need to calculate a 4th order tensor ${mathcal P}$ which is the inverse (under the double-contraction product) of the tensor on the LHS, i.e.
$$eqalign{
{mathcal H}&= {mathcal P}:big({mathcal H}F + F{mathcal K}big)cr
}$$
This allow us to isolate $dF$ and calculate the gradient as
$$eqalign{
dF
&= Big({mathcal P}:(Q{mathcal H}V + V{mathcal H}Q:{mathcal K})Big):dA cr
frac{partial F}{partial A}
&= {mathcal P}:(Q{mathcal H}V + V{mathcal H}Q:{mathcal K}) cr
}$$
In the above, juxtaposition indicates a single-contraction product, while a colon represents double-contraction. Here are some examples in component form
$$eqalign{
{mathcal A} &= {mathcal B}:{mathcal C} &implies
{mathcal A}_{ijmn} &= sum_{kl}{mathcal B}_{ijkl},{mathcal C}_{klmn}
cr
{alpha} &= {B}:{C} &implies
{alpha} &= sum_{ij}{B}_{ij},{C}_{ij}
cr
{A} &= BC &implies
{A}_{ik} &= sum_{j}{B}_{ij},{C}_{jk}
cr
{mathcal A} &= B,{mathcal C} &implies
{mathcal A}_{iklm} &= sum_{j}{B}_{ij},{mathcal C}_{jklm}
cr
}$$
$endgroup$
$begingroup$
Thanks, but I am not sure I understand some things: As I defined it, the matrix $A$ is invertible, and so $A^+=A^{-1}$. (In particular $Q=0$ ). Also, $FF^T=Id$, since $F$ is orthogonal; Do you gain anything from writing $FF^T=AA^+$?
$endgroup$
– Asaf Shachar
Jan 18 at 5:58
$begingroup$
Also, differentiating $d(FF^T) = d(AA^+) $ should result in $dF,F^T+F,dF^T =dA,A^+ + A,dA^{+}$, while you wrote $dF,F^T+F,dF^T = Q,dA,A^+ + V,dA^T,Q $...
$endgroup$
– Asaf Shachar
Jan 18 at 6:02
$begingroup$
If $A^+=A^{-1}$ then $Q=0$. In which case you need to find the tensor equivalent of the nullspace projector to solve the problem. Let ${mathcal M}=({mathcal H}F+F{mathcal K})$ and let's re-purpose ${mathcal P}$ to be the pseudoinverse rather than the regular inverse. So the gradient is $$eqalign{{mathcal M}:{mathcal P}:{mathcal M}&={mathcal M}quad&({rm defines,pseudoinverse})cr {mathcal M}:dF &= 0quad&({rm dF,lies,in,the,nullspace})cr dF &= ({mathcal H}-{mathcal P}:{mathcal M}):dA cr frac{partial F}{partial A} &= ({mathcal H}-{mathcal P}:{mathcal M}) cr }$$
$endgroup$
– lynn
Jan 19 at 22:26
add a comment |
$begingroup$
Define the matrices
$$eqalign{
F &= A(A^TA)^{-1/2} cr
Q &= I-AA^+ = ,Q^T cr
V^T &= A^+ cr
}$$
where $A^+$ denotes the pseudoinverse.
Then
$$eqalign{
FF^T &= A(A^TA)^{-1}A^T cr&= AA^+ cr
d(FF^T) &= d(AA^+) cr
dF,F^T+F,dF^T &= Q,dA,A^+ + V,dA^T,Q cr
}$$
At this point, you have stressed that you don't want to use vectorization, so let's try solving it using 4th order tensors.
There are 3 isotropic tensors $({mathcal H},{mathcal J},{mathcal K})$ and their components can be expressed in terms of Kronecker symbols.
$$eqalign{
{mathcal H}_{ijkl} &= delta_{ik},delta_{jl} cr
{mathcal J}_{ijkl} &= delta_{ij},delta_{kl} cr
{mathcal K}_{ijkl} &= delta_{il},delta_{jk} cr
}$$
We can use these to rearrange that last differential
$$eqalign{
({mathcal H}F + F{mathcal K}):dF
&= (Q{mathcal H}V + V{mathcal H}Q:{mathcal K}):dA cr
}$$
Now we need to calculate a 4th order tensor ${mathcal P}$ which is the inverse (under the double-contraction product) of the tensor on the LHS, i.e.
$$eqalign{
{mathcal H}&= {mathcal P}:big({mathcal H}F + F{mathcal K}big)cr
}$$
This allow us to isolate $dF$ and calculate the gradient as
$$eqalign{
dF
&= Big({mathcal P}:(Q{mathcal H}V + V{mathcal H}Q:{mathcal K})Big):dA cr
frac{partial F}{partial A}
&= {mathcal P}:(Q{mathcal H}V + V{mathcal H}Q:{mathcal K}) cr
}$$
In the above, juxtaposition indicates a single-contraction product, while a colon represents double-contraction. Here are some examples in component form
$$eqalign{
{mathcal A} &= {mathcal B}:{mathcal C} &implies
{mathcal A}_{ijmn} &= sum_{kl}{mathcal B}_{ijkl},{mathcal C}_{klmn}
cr
{alpha} &= {B}:{C} &implies
{alpha} &= sum_{ij}{B}_{ij},{C}_{ij}
cr
{A} &= BC &implies
{A}_{ik} &= sum_{j}{B}_{ij},{C}_{jk}
cr
{mathcal A} &= B,{mathcal C} &implies
{mathcal A}_{iklm} &= sum_{j}{B}_{ij},{mathcal C}_{jklm}
cr
}$$
$endgroup$
Define the matrices
$$eqalign{
F &= A(A^TA)^{-1/2} cr
Q &= I-AA^+ = ,Q^T cr
V^T &= A^+ cr
}$$
where $A^+$ denotes the pseudoinverse.
Then
$$eqalign{
FF^T &= A(A^TA)^{-1}A^T cr&= AA^+ cr
d(FF^T) &= d(AA^+) cr
dF,F^T+F,dF^T &= Q,dA,A^+ + V,dA^T,Q cr
}$$
At this point, you have stressed that you don't want to use vectorization, so let's try solving it using 4th order tensors.
There are 3 isotropic tensors $({mathcal H},{mathcal J},{mathcal K})$ and their components can be expressed in terms of Kronecker symbols.
$$eqalign{
{mathcal H}_{ijkl} &= delta_{ik},delta_{jl} cr
{mathcal J}_{ijkl} &= delta_{ij},delta_{kl} cr
{mathcal K}_{ijkl} &= delta_{il},delta_{jk} cr
}$$
We can use these to rearrange that last differential
$$eqalign{
({mathcal H}F + F{mathcal K}):dF
&= (Q{mathcal H}V + V{mathcal H}Q:{mathcal K}):dA cr
}$$
Now we need to calculate a 4th order tensor ${mathcal P}$ which is the inverse (under the double-contraction product) of the tensor on the LHS, i.e.
$$eqalign{
{mathcal H}&= {mathcal P}:big({mathcal H}F + F{mathcal K}big)cr
}$$
This allow us to isolate $dF$ and calculate the gradient as
$$eqalign{
dF
&= Big({mathcal P}:(Q{mathcal H}V + V{mathcal H}Q:{mathcal K})Big):dA cr
frac{partial F}{partial A}
&= {mathcal P}:(Q{mathcal H}V + V{mathcal H}Q:{mathcal K}) cr
}$$
In the above, juxtaposition indicates a single-contraction product, while a colon represents double-contraction. Here are some examples in component form
$$eqalign{
{mathcal A} &= {mathcal B}:{mathcal C} &implies
{mathcal A}_{ijmn} &= sum_{kl}{mathcal B}_{ijkl},{mathcal C}_{klmn}
cr
{alpha} &= {B}:{C} &implies
{alpha} &= sum_{ij}{B}_{ij},{C}_{ij}
cr
{A} &= BC &implies
{A}_{ik} &= sum_{j}{B}_{ij},{C}_{jk}
cr
{mathcal A} &= B,{mathcal C} &implies
{mathcal A}_{iklm} &= sum_{j}{B}_{ij},{mathcal C}_{jklm}
cr
}$$
edited Jan 17 at 22:10
answered Jan 17 at 21:47
lynnlynn
2,001177
2,001177
$begingroup$
Thanks, but I am not sure I understand some things: As I defined it, the matrix $A$ is invertible, and so $A^+=A^{-1}$. (In particular $Q=0$ ). Also, $FF^T=Id$, since $F$ is orthogonal; Do you gain anything from writing $FF^T=AA^+$?
$endgroup$
– Asaf Shachar
Jan 18 at 5:58
$begingroup$
Also, differentiating $d(FF^T) = d(AA^+) $ should result in $dF,F^T+F,dF^T =dA,A^+ + A,dA^{+}$, while you wrote $dF,F^T+F,dF^T = Q,dA,A^+ + V,dA^T,Q $...
$endgroup$
– Asaf Shachar
Jan 18 at 6:02
$begingroup$
If $A^+=A^{-1}$ then $Q=0$. In which case you need to find the tensor equivalent of the nullspace projector to solve the problem. Let ${mathcal M}=({mathcal H}F+F{mathcal K})$ and let's re-purpose ${mathcal P}$ to be the pseudoinverse rather than the regular inverse. So the gradient is $$eqalign{{mathcal M}:{mathcal P}:{mathcal M}&={mathcal M}quad&({rm defines,pseudoinverse})cr {mathcal M}:dF &= 0quad&({rm dF,lies,in,the,nullspace})cr dF &= ({mathcal H}-{mathcal P}:{mathcal M}):dA cr frac{partial F}{partial A} &= ({mathcal H}-{mathcal P}:{mathcal M}) cr }$$
$endgroup$
– lynn
Jan 19 at 22:26
add a comment |
$begingroup$
Thanks, but I am not sure I understand some things: As I defined it, the matrix $A$ is invertible, and so $A^+=A^{-1}$. (In particular $Q=0$ ). Also, $FF^T=Id$, since $F$ is orthogonal; Do you gain anything from writing $FF^T=AA^+$?
$endgroup$
– Asaf Shachar
Jan 18 at 5:58
$begingroup$
Also, differentiating $d(FF^T) = d(AA^+) $ should result in $dF,F^T+F,dF^T =dA,A^+ + A,dA^{+}$, while you wrote $dF,F^T+F,dF^T = Q,dA,A^+ + V,dA^T,Q $...
$endgroup$
– Asaf Shachar
Jan 18 at 6:02
$begingroup$
If $A^+=A^{-1}$ then $Q=0$. In which case you need to find the tensor equivalent of the nullspace projector to solve the problem. Let ${mathcal M}=({mathcal H}F+F{mathcal K})$ and let's re-purpose ${mathcal P}$ to be the pseudoinverse rather than the regular inverse. So the gradient is $$eqalign{{mathcal M}:{mathcal P}:{mathcal M}&={mathcal M}quad&({rm defines,pseudoinverse})cr {mathcal M}:dF &= 0quad&({rm dF,lies,in,the,nullspace})cr dF &= ({mathcal H}-{mathcal P}:{mathcal M}):dA cr frac{partial F}{partial A} &= ({mathcal H}-{mathcal P}:{mathcal M}) cr }$$
$endgroup$
– lynn
Jan 19 at 22:26
$begingroup$
Thanks, but I am not sure I understand some things: As I defined it, the matrix $A$ is invertible, and so $A^+=A^{-1}$. (In particular $Q=0$ ). Also, $FF^T=Id$, since $F$ is orthogonal; Do you gain anything from writing $FF^T=AA^+$?
$endgroup$
– Asaf Shachar
Jan 18 at 5:58
$begingroup$
Thanks, but I am not sure I understand some things: As I defined it, the matrix $A$ is invertible, and so $A^+=A^{-1}$. (In particular $Q=0$ ). Also, $FF^T=Id$, since $F$ is orthogonal; Do you gain anything from writing $FF^T=AA^+$?
$endgroup$
– Asaf Shachar
Jan 18 at 5:58
$begingroup$
Also, differentiating $d(FF^T) = d(AA^+) $ should result in $dF,F^T+F,dF^T =dA,A^+ + A,dA^{+}$, while you wrote $dF,F^T+F,dF^T = Q,dA,A^+ + V,dA^T,Q $...
$endgroup$
– Asaf Shachar
Jan 18 at 6:02
$begingroup$
Also, differentiating $d(FF^T) = d(AA^+) $ should result in $dF,F^T+F,dF^T =dA,A^+ + A,dA^{+}$, while you wrote $dF,F^T+F,dF^T = Q,dA,A^+ + V,dA^T,Q $...
$endgroup$
– Asaf Shachar
Jan 18 at 6:02
$begingroup$
If $A^+=A^{-1}$ then $Q=0$. In which case you need to find the tensor equivalent of the nullspace projector to solve the problem. Let ${mathcal M}=({mathcal H}F+F{mathcal K})$ and let's re-purpose ${mathcal P}$ to be the pseudoinverse rather than the regular inverse. So the gradient is $$eqalign{{mathcal M}:{mathcal P}:{mathcal M}&={mathcal M}quad&({rm defines,pseudoinverse})cr {mathcal M}:dF &= 0quad&({rm dF,lies,in,the,nullspace})cr dF &= ({mathcal H}-{mathcal P}:{mathcal M}):dA cr frac{partial F}{partial A} &= ({mathcal H}-{mathcal P}:{mathcal M}) cr }$$
$endgroup$
– lynn
Jan 19 at 22:26
$begingroup$
If $A^+=A^{-1}$ then $Q=0$. In which case you need to find the tensor equivalent of the nullspace projector to solve the problem. Let ${mathcal M}=({mathcal H}F+F{mathcal K})$ and let's re-purpose ${mathcal P}$ to be the pseudoinverse rather than the regular inverse. So the gradient is $$eqalign{{mathcal M}:{mathcal P}:{mathcal M}&={mathcal M}quad&({rm defines,pseudoinverse})cr {mathcal M}:dF &= 0quad&({rm dF,lies,in,the,nullspace})cr dF &= ({mathcal H}-{mathcal P}:{mathcal M}):dA cr frac{partial F}{partial A} &= ({mathcal H}-{mathcal P}:{mathcal M}) cr }$$
$endgroup$
– lynn
Jan 19 at 22:26
add a comment |
Thanks for contributing an answer to Mathematics Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3076821%2fis-there-a-closed-form-formula-for-the-derivative-of-the-orthogonal-polar-factor%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown