Second order approximation of log det X
$begingroup$
I'm trying to follow the derivation of second order approximation of log det X from Boyd's "Convex Optimization", p.658 in http://www.stanford.edu/~boyd/cvxbook/
How is the last step derived? IE, where does the trace expression come from?
linear-algebra convex-analysis
$endgroup$
add a comment |
$begingroup$
I'm trying to follow the derivation of second order approximation of log det X from Boyd's "Convex Optimization", p.658 in http://www.stanford.edu/~boyd/cvxbook/
How is the last step derived? IE, where does the trace expression come from?
linear-algebra convex-analysis
$endgroup$
$begingroup$
This should be just the trace of log[x]. A relevant question is here:math.stackexchange.com/questions/38701/…
$endgroup$
– Bombyx mori
Nov 29 '12 at 6:23
add a comment |
$begingroup$
I'm trying to follow the derivation of second order approximation of log det X from Boyd's "Convex Optimization", p.658 in http://www.stanford.edu/~boyd/cvxbook/
How is the last step derived? IE, where does the trace expression come from?
linear-algebra convex-analysis
$endgroup$
I'm trying to follow the derivation of second order approximation of log det X from Boyd's "Convex Optimization", p.658 in http://www.stanford.edu/~boyd/cvxbook/
How is the last step derived? IE, where does the trace expression come from?
linear-algebra convex-analysis
linear-algebra convex-analysis
edited Jan 8 at 19:32
Glorfindel
3,41981830
3,41981830
asked Nov 29 '12 at 5:07
Yaroslav BulatovYaroslav Bulatov
1,87411526
1,87411526
$begingroup$
This should be just the trace of log[x]. A relevant question is here:math.stackexchange.com/questions/38701/…
$endgroup$
– Bombyx mori
Nov 29 '12 at 6:23
add a comment |
$begingroup$
This should be just the trace of log[x]. A relevant question is here:math.stackexchange.com/questions/38701/…
$endgroup$
– Bombyx mori
Nov 29 '12 at 6:23
$begingroup$
This should be just the trace of log[x]. A relevant question is here:math.stackexchange.com/questions/38701/…
$endgroup$
– Bombyx mori
Nov 29 '12 at 6:23
$begingroup$
This should be just the trace of log[x]. A relevant question is here:math.stackexchange.com/questions/38701/…
$endgroup$
– Bombyx mori
Nov 29 '12 at 6:23
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
Short answer: The trace gives the scalar product on the space of matrices: $langle X,Y rangle = mathrm{tr}(X^top Y)$. Since you're working with symmetric matrices, you can forget the transposition: $langle X,Y rangle = mathrm{tr}(XY)$.
Long answer, with all the gory details: Given a function $f:mathrm S_n^{++}tomathbf R$, the link between the gradient $nabla_Xf$ of the function $f$ at $X$ (which is a vector) and its differential $d_Xf$ at $X$ (which is a linear form) is that for any $Uin V$,
$$ d_Xf(U) = langle nabla_Xf,U rangle. $$
For your function $f$, since you know the gradient, you can write the differential:
$$ d_Xf(U) = langle X^{-1},U rangle = mathrm{tr}(X^{-1}U). $$
What about the second order differential? Well, it's the differential of the differential. Let's take it slow. The differential of $f$ is the function $df:mathrm S_n^{++}tomathrm L(mathrm M_n,mathbf R)$, defined by $df(X) = Vmapsto mathrm{tr}(X^{-1}V)$. To find the differential of $df$ at $X$, we look at $df(X+Delta X)$, and take the part that varies linearly in $Delta X$. Since $df(X+Delta X)$ is a function $mathrm M_ntomathbf R$, if we hope to ever understand anything we should apply it to some matrix $V$:
$$ df(X+Delta X)(V) = mathrm{tr}left[ (X+Delta X)^{-1} V right] $$
and use the approximation from the passage you cited:
begin{align*}
df(X+Delta X)(V)
&simeq mathrm{tr}left[ left(X^{-1} - X^{-1}(Delta X)X^{-1}right) V right]\
&= mathrm{tr}(X^{-1}V) - mathrm{tr}(X^{-1}(Delta X)X^{-1}V)\
&= df(X)(V) - mathrm{tr}(X^{-1}(Delta X)X^{-1}V).
end{align*}
And we just see that the part that varies linearly in $Delta X$ is the $-mathrm{tr}(cdots)$. So the differential of $df$ at $X$ is the function $d^2_Xf:mathrm S_n^{++}tomathrm L(mathrm M_n, mathrm L(mathrm M_n,mathbf R))$ defined by
$$ d^2_Xf(U)(V) = -mathrm{tr}(X^{-1}UX^{-1}V). $$
$endgroup$
$begingroup$
It's better to consider $d^2f_X$ as a symmetric bilinear form; $d^2f_X(U,V)=-tr(X^{-1}UX^{-1}V)$. When you write the Taylor formula, you use $d^2f_X(U,U)=-tr((X^{-1}U)^2)$.
$endgroup$
– loup blanc
Jan 9 at 14:52
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f247043%2fsecond-order-approximation-of-log-det-x%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Short answer: The trace gives the scalar product on the space of matrices: $langle X,Y rangle = mathrm{tr}(X^top Y)$. Since you're working with symmetric matrices, you can forget the transposition: $langle X,Y rangle = mathrm{tr}(XY)$.
Long answer, with all the gory details: Given a function $f:mathrm S_n^{++}tomathbf R$, the link between the gradient $nabla_Xf$ of the function $f$ at $X$ (which is a vector) and its differential $d_Xf$ at $X$ (which is a linear form) is that for any $Uin V$,
$$ d_Xf(U) = langle nabla_Xf,U rangle. $$
For your function $f$, since you know the gradient, you can write the differential:
$$ d_Xf(U) = langle X^{-1},U rangle = mathrm{tr}(X^{-1}U). $$
What about the second order differential? Well, it's the differential of the differential. Let's take it slow. The differential of $f$ is the function $df:mathrm S_n^{++}tomathrm L(mathrm M_n,mathbf R)$, defined by $df(X) = Vmapsto mathrm{tr}(X^{-1}V)$. To find the differential of $df$ at $X$, we look at $df(X+Delta X)$, and take the part that varies linearly in $Delta X$. Since $df(X+Delta X)$ is a function $mathrm M_ntomathbf R$, if we hope to ever understand anything we should apply it to some matrix $V$:
$$ df(X+Delta X)(V) = mathrm{tr}left[ (X+Delta X)^{-1} V right] $$
and use the approximation from the passage you cited:
begin{align*}
df(X+Delta X)(V)
&simeq mathrm{tr}left[ left(X^{-1} - X^{-1}(Delta X)X^{-1}right) V right]\
&= mathrm{tr}(X^{-1}V) - mathrm{tr}(X^{-1}(Delta X)X^{-1}V)\
&= df(X)(V) - mathrm{tr}(X^{-1}(Delta X)X^{-1}V).
end{align*}
And we just see that the part that varies linearly in $Delta X$ is the $-mathrm{tr}(cdots)$. So the differential of $df$ at $X$ is the function $d^2_Xf:mathrm S_n^{++}tomathrm L(mathrm M_n, mathrm L(mathrm M_n,mathbf R))$ defined by
$$ d^2_Xf(U)(V) = -mathrm{tr}(X^{-1}UX^{-1}V). $$
$endgroup$
$begingroup$
It's better to consider $d^2f_X$ as a symmetric bilinear form; $d^2f_X(U,V)=-tr(X^{-1}UX^{-1}V)$. When you write the Taylor formula, you use $d^2f_X(U,U)=-tr((X^{-1}U)^2)$.
$endgroup$
– loup blanc
Jan 9 at 14:52
add a comment |
$begingroup$
Short answer: The trace gives the scalar product on the space of matrices: $langle X,Y rangle = mathrm{tr}(X^top Y)$. Since you're working with symmetric matrices, you can forget the transposition: $langle X,Y rangle = mathrm{tr}(XY)$.
Long answer, with all the gory details: Given a function $f:mathrm S_n^{++}tomathbf R$, the link between the gradient $nabla_Xf$ of the function $f$ at $X$ (which is a vector) and its differential $d_Xf$ at $X$ (which is a linear form) is that for any $Uin V$,
$$ d_Xf(U) = langle nabla_Xf,U rangle. $$
For your function $f$, since you know the gradient, you can write the differential:
$$ d_Xf(U) = langle X^{-1},U rangle = mathrm{tr}(X^{-1}U). $$
What about the second order differential? Well, it's the differential of the differential. Let's take it slow. The differential of $f$ is the function $df:mathrm S_n^{++}tomathrm L(mathrm M_n,mathbf R)$, defined by $df(X) = Vmapsto mathrm{tr}(X^{-1}V)$. To find the differential of $df$ at $X$, we look at $df(X+Delta X)$, and take the part that varies linearly in $Delta X$. Since $df(X+Delta X)$ is a function $mathrm M_ntomathbf R$, if we hope to ever understand anything we should apply it to some matrix $V$:
$$ df(X+Delta X)(V) = mathrm{tr}left[ (X+Delta X)^{-1} V right] $$
and use the approximation from the passage you cited:
begin{align*}
df(X+Delta X)(V)
&simeq mathrm{tr}left[ left(X^{-1} - X^{-1}(Delta X)X^{-1}right) V right]\
&= mathrm{tr}(X^{-1}V) - mathrm{tr}(X^{-1}(Delta X)X^{-1}V)\
&= df(X)(V) - mathrm{tr}(X^{-1}(Delta X)X^{-1}V).
end{align*}
And we just see that the part that varies linearly in $Delta X$ is the $-mathrm{tr}(cdots)$. So the differential of $df$ at $X$ is the function $d^2_Xf:mathrm S_n^{++}tomathrm L(mathrm M_n, mathrm L(mathrm M_n,mathbf R))$ defined by
$$ d^2_Xf(U)(V) = -mathrm{tr}(X^{-1}UX^{-1}V). $$
$endgroup$
$begingroup$
It's better to consider $d^2f_X$ as a symmetric bilinear form; $d^2f_X(U,V)=-tr(X^{-1}UX^{-1}V)$. When you write the Taylor formula, you use $d^2f_X(U,U)=-tr((X^{-1}U)^2)$.
$endgroup$
– loup blanc
Jan 9 at 14:52
add a comment |
$begingroup$
Short answer: The trace gives the scalar product on the space of matrices: $langle X,Y rangle = mathrm{tr}(X^top Y)$. Since you're working with symmetric matrices, you can forget the transposition: $langle X,Y rangle = mathrm{tr}(XY)$.
Long answer, with all the gory details: Given a function $f:mathrm S_n^{++}tomathbf R$, the link between the gradient $nabla_Xf$ of the function $f$ at $X$ (which is a vector) and its differential $d_Xf$ at $X$ (which is a linear form) is that for any $Uin V$,
$$ d_Xf(U) = langle nabla_Xf,U rangle. $$
For your function $f$, since you know the gradient, you can write the differential:
$$ d_Xf(U) = langle X^{-1},U rangle = mathrm{tr}(X^{-1}U). $$
What about the second order differential? Well, it's the differential of the differential. Let's take it slow. The differential of $f$ is the function $df:mathrm S_n^{++}tomathrm L(mathrm M_n,mathbf R)$, defined by $df(X) = Vmapsto mathrm{tr}(X^{-1}V)$. To find the differential of $df$ at $X$, we look at $df(X+Delta X)$, and take the part that varies linearly in $Delta X$. Since $df(X+Delta X)$ is a function $mathrm M_ntomathbf R$, if we hope to ever understand anything we should apply it to some matrix $V$:
$$ df(X+Delta X)(V) = mathrm{tr}left[ (X+Delta X)^{-1} V right] $$
and use the approximation from the passage you cited:
begin{align*}
df(X+Delta X)(V)
&simeq mathrm{tr}left[ left(X^{-1} - X^{-1}(Delta X)X^{-1}right) V right]\
&= mathrm{tr}(X^{-1}V) - mathrm{tr}(X^{-1}(Delta X)X^{-1}V)\
&= df(X)(V) - mathrm{tr}(X^{-1}(Delta X)X^{-1}V).
end{align*}
And we just see that the part that varies linearly in $Delta X$ is the $-mathrm{tr}(cdots)$. So the differential of $df$ at $X$ is the function $d^2_Xf:mathrm S_n^{++}tomathrm L(mathrm M_n, mathrm L(mathrm M_n,mathbf R))$ defined by
$$ d^2_Xf(U)(V) = -mathrm{tr}(X^{-1}UX^{-1}V). $$
$endgroup$
Short answer: The trace gives the scalar product on the space of matrices: $langle X,Y rangle = mathrm{tr}(X^top Y)$. Since you're working with symmetric matrices, you can forget the transposition: $langle X,Y rangle = mathrm{tr}(XY)$.
Long answer, with all the gory details: Given a function $f:mathrm S_n^{++}tomathbf R$, the link between the gradient $nabla_Xf$ of the function $f$ at $X$ (which is a vector) and its differential $d_Xf$ at $X$ (which is a linear form) is that for any $Uin V$,
$$ d_Xf(U) = langle nabla_Xf,U rangle. $$
For your function $f$, since you know the gradient, you can write the differential:
$$ d_Xf(U) = langle X^{-1},U rangle = mathrm{tr}(X^{-1}U). $$
What about the second order differential? Well, it's the differential of the differential. Let's take it slow. The differential of $f$ is the function $df:mathrm S_n^{++}tomathrm L(mathrm M_n,mathbf R)$, defined by $df(X) = Vmapsto mathrm{tr}(X^{-1}V)$. To find the differential of $df$ at $X$, we look at $df(X+Delta X)$, and take the part that varies linearly in $Delta X$. Since $df(X+Delta X)$ is a function $mathrm M_ntomathbf R$, if we hope to ever understand anything we should apply it to some matrix $V$:
$$ df(X+Delta X)(V) = mathrm{tr}left[ (X+Delta X)^{-1} V right] $$
and use the approximation from the passage you cited:
begin{align*}
df(X+Delta X)(V)
&simeq mathrm{tr}left[ left(X^{-1} - X^{-1}(Delta X)X^{-1}right) V right]\
&= mathrm{tr}(X^{-1}V) - mathrm{tr}(X^{-1}(Delta X)X^{-1}V)\
&= df(X)(V) - mathrm{tr}(X^{-1}(Delta X)X^{-1}V).
end{align*}
And we just see that the part that varies linearly in $Delta X$ is the $-mathrm{tr}(cdots)$. So the differential of $df$ at $X$ is the function $d^2_Xf:mathrm S_n^{++}tomathrm L(mathrm M_n, mathrm L(mathrm M_n,mathbf R))$ defined by
$$ d^2_Xf(U)(V) = -mathrm{tr}(X^{-1}UX^{-1}V). $$
answered Nov 29 '12 at 8:19
jathdjathd
1,59679
1,59679
$begingroup$
It's better to consider $d^2f_X$ as a symmetric bilinear form; $d^2f_X(U,V)=-tr(X^{-1}UX^{-1}V)$. When you write the Taylor formula, you use $d^2f_X(U,U)=-tr((X^{-1}U)^2)$.
$endgroup$
– loup blanc
Jan 9 at 14:52
add a comment |
$begingroup$
It's better to consider $d^2f_X$ as a symmetric bilinear form; $d^2f_X(U,V)=-tr(X^{-1}UX^{-1}V)$. When you write the Taylor formula, you use $d^2f_X(U,U)=-tr((X^{-1}U)^2)$.
$endgroup$
– loup blanc
Jan 9 at 14:52
$begingroup$
It's better to consider $d^2f_X$ as a symmetric bilinear form; $d^2f_X(U,V)=-tr(X^{-1}UX^{-1}V)$. When you write the Taylor formula, you use $d^2f_X(U,U)=-tr((X^{-1}U)^2)$.
$endgroup$
– loup blanc
Jan 9 at 14:52
$begingroup$
It's better to consider $d^2f_X$ as a symmetric bilinear form; $d^2f_X(U,V)=-tr(X^{-1}UX^{-1}V)$. When you write the Taylor formula, you use $d^2f_X(U,U)=-tr((X^{-1}U)^2)$.
$endgroup$
– loup blanc
Jan 9 at 14:52
add a comment |
Thanks for contributing an answer to Mathematics Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f247043%2fsecond-order-approximation-of-log-det-x%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
This should be just the trace of log[x]. A relevant question is here:math.stackexchange.com/questions/38701/…
$endgroup$
– Bombyx mori
Nov 29 '12 at 6:23