Second order approximation of log det X












2












$begingroup$


I'm trying to follow the derivation of second order approximation of log det X from Boyd's "Convex Optimization", p.658 in http://www.stanford.edu/~boyd/cvxbook/



How is the last step derived? IE, where does the trace expression come from?












share|cite|improve this question











$endgroup$












  • $begingroup$
    This should be just the trace of log[x]. A relevant question is here:math.stackexchange.com/questions/38701/…
    $endgroup$
    – Bombyx mori
    Nov 29 '12 at 6:23
















2












$begingroup$


I'm trying to follow the derivation of second order approximation of log det X from Boyd's "Convex Optimization", p.658 in http://www.stanford.edu/~boyd/cvxbook/



How is the last step derived? IE, where does the trace expression come from?












share|cite|improve this question











$endgroup$












  • $begingroup$
    This should be just the trace of log[x]. A relevant question is here:math.stackexchange.com/questions/38701/…
    $endgroup$
    – Bombyx mori
    Nov 29 '12 at 6:23














2












2








2


4



$begingroup$


I'm trying to follow the derivation of second order approximation of log det X from Boyd's "Convex Optimization", p.658 in http://www.stanford.edu/~boyd/cvxbook/



How is the last step derived? IE, where does the trace expression come from?












share|cite|improve this question











$endgroup$




I'm trying to follow the derivation of second order approximation of log det X from Boyd's "Convex Optimization", p.658 in http://www.stanford.edu/~boyd/cvxbook/



How is the last step derived? IE, where does the trace expression come from?









linear-algebra convex-analysis






share|cite|improve this question















share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited Jan 8 at 19:32









Glorfindel

3,41981830




3,41981830










asked Nov 29 '12 at 5:07









Yaroslav BulatovYaroslav Bulatov

1,87411526




1,87411526












  • $begingroup$
    This should be just the trace of log[x]. A relevant question is here:math.stackexchange.com/questions/38701/…
    $endgroup$
    – Bombyx mori
    Nov 29 '12 at 6:23


















  • $begingroup$
    This should be just the trace of log[x]. A relevant question is here:math.stackexchange.com/questions/38701/…
    $endgroup$
    – Bombyx mori
    Nov 29 '12 at 6:23
















$begingroup$
This should be just the trace of log[x]. A relevant question is here:math.stackexchange.com/questions/38701/…
$endgroup$
– Bombyx mori
Nov 29 '12 at 6:23




$begingroup$
This should be just the trace of log[x]. A relevant question is here:math.stackexchange.com/questions/38701/…
$endgroup$
– Bombyx mori
Nov 29 '12 at 6:23










1 Answer
1






active

oldest

votes


















2












$begingroup$

Short answer: The trace gives the scalar product on the space of matrices: $langle X,Y rangle = mathrm{tr}(X^top Y)$. Since you're working with symmetric matrices, you can forget the transposition: $langle X,Y rangle = mathrm{tr}(XY)$.



Long answer, with all the gory details: Given a function $f:mathrm S_n^{++}tomathbf R$, the link between the gradient $nabla_Xf$ of the function $f$ at $X$ (which is a vector) and its differential $d_Xf$ at $X$ (which is a linear form) is that for any $Uin V$,
$$ d_Xf(U) = langle nabla_Xf,U rangle. $$
For your function $f$, since you know the gradient, you can write the differential:
$$ d_Xf(U) = langle X^{-1},U rangle = mathrm{tr}(X^{-1}U). $$



What about the second order differential? Well, it's the differential of the differential. Let's take it slow. The differential of $f$ is the function $df:mathrm S_n^{++}tomathrm L(mathrm M_n,mathbf R)$, defined by $df(X) = Vmapsto mathrm{tr}(X^{-1}V)$. To find the differential of $df$ at $X$, we look at $df(X+Delta X)$, and take the part that varies linearly in $Delta X$. Since $df(X+Delta X)$ is a function $mathrm M_ntomathbf R$, if we hope to ever understand anything we should apply it to some matrix $V$:
$$ df(X+Delta X)(V) = mathrm{tr}left[ (X+Delta X)^{-1} V right] $$
and use the approximation from the passage you cited:
begin{align*}
df(X+Delta X)(V)
&simeq mathrm{tr}left[ left(X^{-1} - X^{-1}(Delta X)X^{-1}right) V right]\
&= mathrm{tr}(X^{-1}V) - mathrm{tr}(X^{-1}(Delta X)X^{-1}V)\
&= df(X)(V) - mathrm{tr}(X^{-1}(Delta X)X^{-1}V).
end{align*}
And we just see that the part that varies linearly in $Delta X$ is the $-mathrm{tr}(cdots)$. So the differential of $df$ at $X$ is the function $d^2_Xf:mathrm S_n^{++}tomathrm L(mathrm M_n, mathrm L(mathrm M_n,mathbf R))$ defined by
$$ d^2_Xf(U)(V) = -mathrm{tr}(X^{-1}UX^{-1}V). $$






share|cite|improve this answer









$endgroup$













  • $begingroup$
    It's better to consider $d^2f_X$ as a symmetric bilinear form; $d^2f_X(U,V)=-tr(X^{-1}UX^{-1}V)$. When you write the Taylor formula, you use $d^2f_X(U,U)=-tr((X^{-1}U)^2)$.
    $endgroup$
    – loup blanc
    Jan 9 at 14:52











Your Answer





StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f247043%2fsecond-order-approximation-of-log-det-x%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









2












$begingroup$

Short answer: The trace gives the scalar product on the space of matrices: $langle X,Y rangle = mathrm{tr}(X^top Y)$. Since you're working with symmetric matrices, you can forget the transposition: $langle X,Y rangle = mathrm{tr}(XY)$.



Long answer, with all the gory details: Given a function $f:mathrm S_n^{++}tomathbf R$, the link between the gradient $nabla_Xf$ of the function $f$ at $X$ (which is a vector) and its differential $d_Xf$ at $X$ (which is a linear form) is that for any $Uin V$,
$$ d_Xf(U) = langle nabla_Xf,U rangle. $$
For your function $f$, since you know the gradient, you can write the differential:
$$ d_Xf(U) = langle X^{-1},U rangle = mathrm{tr}(X^{-1}U). $$



What about the second order differential? Well, it's the differential of the differential. Let's take it slow. The differential of $f$ is the function $df:mathrm S_n^{++}tomathrm L(mathrm M_n,mathbf R)$, defined by $df(X) = Vmapsto mathrm{tr}(X^{-1}V)$. To find the differential of $df$ at $X$, we look at $df(X+Delta X)$, and take the part that varies linearly in $Delta X$. Since $df(X+Delta X)$ is a function $mathrm M_ntomathbf R$, if we hope to ever understand anything we should apply it to some matrix $V$:
$$ df(X+Delta X)(V) = mathrm{tr}left[ (X+Delta X)^{-1} V right] $$
and use the approximation from the passage you cited:
begin{align*}
df(X+Delta X)(V)
&simeq mathrm{tr}left[ left(X^{-1} - X^{-1}(Delta X)X^{-1}right) V right]\
&= mathrm{tr}(X^{-1}V) - mathrm{tr}(X^{-1}(Delta X)X^{-1}V)\
&= df(X)(V) - mathrm{tr}(X^{-1}(Delta X)X^{-1}V).
end{align*}
And we just see that the part that varies linearly in $Delta X$ is the $-mathrm{tr}(cdots)$. So the differential of $df$ at $X$ is the function $d^2_Xf:mathrm S_n^{++}tomathrm L(mathrm M_n, mathrm L(mathrm M_n,mathbf R))$ defined by
$$ d^2_Xf(U)(V) = -mathrm{tr}(X^{-1}UX^{-1}V). $$






share|cite|improve this answer









$endgroup$













  • $begingroup$
    It's better to consider $d^2f_X$ as a symmetric bilinear form; $d^2f_X(U,V)=-tr(X^{-1}UX^{-1}V)$. When you write the Taylor formula, you use $d^2f_X(U,U)=-tr((X^{-1}U)^2)$.
    $endgroup$
    – loup blanc
    Jan 9 at 14:52
















2












$begingroup$

Short answer: The trace gives the scalar product on the space of matrices: $langle X,Y rangle = mathrm{tr}(X^top Y)$. Since you're working with symmetric matrices, you can forget the transposition: $langle X,Y rangle = mathrm{tr}(XY)$.



Long answer, with all the gory details: Given a function $f:mathrm S_n^{++}tomathbf R$, the link between the gradient $nabla_Xf$ of the function $f$ at $X$ (which is a vector) and its differential $d_Xf$ at $X$ (which is a linear form) is that for any $Uin V$,
$$ d_Xf(U) = langle nabla_Xf,U rangle. $$
For your function $f$, since you know the gradient, you can write the differential:
$$ d_Xf(U) = langle X^{-1},U rangle = mathrm{tr}(X^{-1}U). $$



What about the second order differential? Well, it's the differential of the differential. Let's take it slow. The differential of $f$ is the function $df:mathrm S_n^{++}tomathrm L(mathrm M_n,mathbf R)$, defined by $df(X) = Vmapsto mathrm{tr}(X^{-1}V)$. To find the differential of $df$ at $X$, we look at $df(X+Delta X)$, and take the part that varies linearly in $Delta X$. Since $df(X+Delta X)$ is a function $mathrm M_ntomathbf R$, if we hope to ever understand anything we should apply it to some matrix $V$:
$$ df(X+Delta X)(V) = mathrm{tr}left[ (X+Delta X)^{-1} V right] $$
and use the approximation from the passage you cited:
begin{align*}
df(X+Delta X)(V)
&simeq mathrm{tr}left[ left(X^{-1} - X^{-1}(Delta X)X^{-1}right) V right]\
&= mathrm{tr}(X^{-1}V) - mathrm{tr}(X^{-1}(Delta X)X^{-1}V)\
&= df(X)(V) - mathrm{tr}(X^{-1}(Delta X)X^{-1}V).
end{align*}
And we just see that the part that varies linearly in $Delta X$ is the $-mathrm{tr}(cdots)$. So the differential of $df$ at $X$ is the function $d^2_Xf:mathrm S_n^{++}tomathrm L(mathrm M_n, mathrm L(mathrm M_n,mathbf R))$ defined by
$$ d^2_Xf(U)(V) = -mathrm{tr}(X^{-1}UX^{-1}V). $$






share|cite|improve this answer









$endgroup$













  • $begingroup$
    It's better to consider $d^2f_X$ as a symmetric bilinear form; $d^2f_X(U,V)=-tr(X^{-1}UX^{-1}V)$. When you write the Taylor formula, you use $d^2f_X(U,U)=-tr((X^{-1}U)^2)$.
    $endgroup$
    – loup blanc
    Jan 9 at 14:52














2












2








2





$begingroup$

Short answer: The trace gives the scalar product on the space of matrices: $langle X,Y rangle = mathrm{tr}(X^top Y)$. Since you're working with symmetric matrices, you can forget the transposition: $langle X,Y rangle = mathrm{tr}(XY)$.



Long answer, with all the gory details: Given a function $f:mathrm S_n^{++}tomathbf R$, the link between the gradient $nabla_Xf$ of the function $f$ at $X$ (which is a vector) and its differential $d_Xf$ at $X$ (which is a linear form) is that for any $Uin V$,
$$ d_Xf(U) = langle nabla_Xf,U rangle. $$
For your function $f$, since you know the gradient, you can write the differential:
$$ d_Xf(U) = langle X^{-1},U rangle = mathrm{tr}(X^{-1}U). $$



What about the second order differential? Well, it's the differential of the differential. Let's take it slow. The differential of $f$ is the function $df:mathrm S_n^{++}tomathrm L(mathrm M_n,mathbf R)$, defined by $df(X) = Vmapsto mathrm{tr}(X^{-1}V)$. To find the differential of $df$ at $X$, we look at $df(X+Delta X)$, and take the part that varies linearly in $Delta X$. Since $df(X+Delta X)$ is a function $mathrm M_ntomathbf R$, if we hope to ever understand anything we should apply it to some matrix $V$:
$$ df(X+Delta X)(V) = mathrm{tr}left[ (X+Delta X)^{-1} V right] $$
and use the approximation from the passage you cited:
begin{align*}
df(X+Delta X)(V)
&simeq mathrm{tr}left[ left(X^{-1} - X^{-1}(Delta X)X^{-1}right) V right]\
&= mathrm{tr}(X^{-1}V) - mathrm{tr}(X^{-1}(Delta X)X^{-1}V)\
&= df(X)(V) - mathrm{tr}(X^{-1}(Delta X)X^{-1}V).
end{align*}
And we just see that the part that varies linearly in $Delta X$ is the $-mathrm{tr}(cdots)$. So the differential of $df$ at $X$ is the function $d^2_Xf:mathrm S_n^{++}tomathrm L(mathrm M_n, mathrm L(mathrm M_n,mathbf R))$ defined by
$$ d^2_Xf(U)(V) = -mathrm{tr}(X^{-1}UX^{-1}V). $$






share|cite|improve this answer









$endgroup$



Short answer: The trace gives the scalar product on the space of matrices: $langle X,Y rangle = mathrm{tr}(X^top Y)$. Since you're working with symmetric matrices, you can forget the transposition: $langle X,Y rangle = mathrm{tr}(XY)$.



Long answer, with all the gory details: Given a function $f:mathrm S_n^{++}tomathbf R$, the link between the gradient $nabla_Xf$ of the function $f$ at $X$ (which is a vector) and its differential $d_Xf$ at $X$ (which is a linear form) is that for any $Uin V$,
$$ d_Xf(U) = langle nabla_Xf,U rangle. $$
For your function $f$, since you know the gradient, you can write the differential:
$$ d_Xf(U) = langle X^{-1},U rangle = mathrm{tr}(X^{-1}U). $$



What about the second order differential? Well, it's the differential of the differential. Let's take it slow. The differential of $f$ is the function $df:mathrm S_n^{++}tomathrm L(mathrm M_n,mathbf R)$, defined by $df(X) = Vmapsto mathrm{tr}(X^{-1}V)$. To find the differential of $df$ at $X$, we look at $df(X+Delta X)$, and take the part that varies linearly in $Delta X$. Since $df(X+Delta X)$ is a function $mathrm M_ntomathbf R$, if we hope to ever understand anything we should apply it to some matrix $V$:
$$ df(X+Delta X)(V) = mathrm{tr}left[ (X+Delta X)^{-1} V right] $$
and use the approximation from the passage you cited:
begin{align*}
df(X+Delta X)(V)
&simeq mathrm{tr}left[ left(X^{-1} - X^{-1}(Delta X)X^{-1}right) V right]\
&= mathrm{tr}(X^{-1}V) - mathrm{tr}(X^{-1}(Delta X)X^{-1}V)\
&= df(X)(V) - mathrm{tr}(X^{-1}(Delta X)X^{-1}V).
end{align*}
And we just see that the part that varies linearly in $Delta X$ is the $-mathrm{tr}(cdots)$. So the differential of $df$ at $X$ is the function $d^2_Xf:mathrm S_n^{++}tomathrm L(mathrm M_n, mathrm L(mathrm M_n,mathbf R))$ defined by
$$ d^2_Xf(U)(V) = -mathrm{tr}(X^{-1}UX^{-1}V). $$







share|cite|improve this answer












share|cite|improve this answer



share|cite|improve this answer










answered Nov 29 '12 at 8:19









jathdjathd

1,59679




1,59679












  • $begingroup$
    It's better to consider $d^2f_X$ as a symmetric bilinear form; $d^2f_X(U,V)=-tr(X^{-1}UX^{-1}V)$. When you write the Taylor formula, you use $d^2f_X(U,U)=-tr((X^{-1}U)^2)$.
    $endgroup$
    – loup blanc
    Jan 9 at 14:52


















  • $begingroup$
    It's better to consider $d^2f_X$ as a symmetric bilinear form; $d^2f_X(U,V)=-tr(X^{-1}UX^{-1}V)$. When you write the Taylor formula, you use $d^2f_X(U,U)=-tr((X^{-1}U)^2)$.
    $endgroup$
    – loup blanc
    Jan 9 at 14:52
















$begingroup$
It's better to consider $d^2f_X$ as a symmetric bilinear form; $d^2f_X(U,V)=-tr(X^{-1}UX^{-1}V)$. When you write the Taylor formula, you use $d^2f_X(U,U)=-tr((X^{-1}U)^2)$.
$endgroup$
– loup blanc
Jan 9 at 14:52




$begingroup$
It's better to consider $d^2f_X$ as a symmetric bilinear form; $d^2f_X(U,V)=-tr(X^{-1}UX^{-1}V)$. When you write the Taylor formula, you use $d^2f_X(U,U)=-tr((X^{-1}U)^2)$.
$endgroup$
– loup blanc
Jan 9 at 14:52


















draft saved

draft discarded




















































Thanks for contributing an answer to Mathematics Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f247043%2fsecond-order-approximation-of-log-det-x%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

'app-layout' is not a known element: how to share Component with different Modules

android studio warns about leanback feature tag usage required on manifest while using Unity exported app?

WPF add header to Image with URL pettitions [duplicate]