Compute the gradient of $f(x)=|text{diag}(x)|$ with the chain rule
$begingroup$
Consider the function $f:mathbb{R}^ntomathbb{R}$ given by $f(x)=|text{diag}(x)|$, where $text{diag}(x)inmathbb{R}^{ntimes{n}}$ is the diagonal matrix with diagonal entries $x_1,x_2,dots,x_n$, and $|cdot|$ is the spectral norm (matrix 2-norm).
Since the spectral norm of a matrix is its largest singular value, and the singular values of a (square) diagonal matrix are the absolute values of the diagonal entries, we see that $f(x)=|x|_infty$, where $|cdot|_infty$ is the (vector) sup-norm. In this form, it is easier to deduce the properties of $f$--in particular, it is differentiable at any point $xinmathbb{R}^n$ where the largest element of $x$ (in absolute value) is unique. At such a point, the gradient of $f$ is given by
$$
nabla{f(x)}=text{sgn}(x_k)e_k
$$
where $k$ is the index of the (unique) largest entry of $x$ (in absolute value), $e_k$ is the $k^text{th}$ standard basis vector in $mathbb{R}^n$, and $text{sgn}(cdot)$ is the sign function.
I want to deduce the above expression for the gradient using the chain rule applied to $f(x)=(gcirc{h})(x)$, where $g:mathbb{R}^{ntimes{n}}tomathbb{R}$ is given by $g(A)=|A|$, and $h:mathbb{R}^ntomathbb{R}^{ntimes{n}}$ is given by $h(x)=text{diag}(x)$.
The "Jacobian" of $h$ is a three-dimensional object, where
$$
frac{partial[h(x)]_{ij}}{partial{x_k}}=begin{cases}1,&i=j=k,\0,&text{else.}end{cases}
$$
The function $g$ is differentiable at any point $A$ where $A$ has a unique largest singular value, in which case the gradient(?) is given by
$$
nabla{g(A)}=uv^text{T},
$$
where $u$ and $v$ are the left and right singular vectors (respectively) corresponding to the (unique) largest singular value of $A$.
So I essentially have a three-dimensional object and a 2-dimensional object, and I want to apply the chain rule to get the gradient, a 1-dimensional object (i.e. a vector). A straightforward application suggests "multiplying them together" (not sure that concept is even defined), which seems like it would produce a matrix. What simple thing am I missing here?
derivatives matrix-calculus chain-rule
$endgroup$
add a comment |
$begingroup$
Consider the function $f:mathbb{R}^ntomathbb{R}$ given by $f(x)=|text{diag}(x)|$, where $text{diag}(x)inmathbb{R}^{ntimes{n}}$ is the diagonal matrix with diagonal entries $x_1,x_2,dots,x_n$, and $|cdot|$ is the spectral norm (matrix 2-norm).
Since the spectral norm of a matrix is its largest singular value, and the singular values of a (square) diagonal matrix are the absolute values of the diagonal entries, we see that $f(x)=|x|_infty$, where $|cdot|_infty$ is the (vector) sup-norm. In this form, it is easier to deduce the properties of $f$--in particular, it is differentiable at any point $xinmathbb{R}^n$ where the largest element of $x$ (in absolute value) is unique. At such a point, the gradient of $f$ is given by
$$
nabla{f(x)}=text{sgn}(x_k)e_k
$$
where $k$ is the index of the (unique) largest entry of $x$ (in absolute value), $e_k$ is the $k^text{th}$ standard basis vector in $mathbb{R}^n$, and $text{sgn}(cdot)$ is the sign function.
I want to deduce the above expression for the gradient using the chain rule applied to $f(x)=(gcirc{h})(x)$, where $g:mathbb{R}^{ntimes{n}}tomathbb{R}$ is given by $g(A)=|A|$, and $h:mathbb{R}^ntomathbb{R}^{ntimes{n}}$ is given by $h(x)=text{diag}(x)$.
The "Jacobian" of $h$ is a three-dimensional object, where
$$
frac{partial[h(x)]_{ij}}{partial{x_k}}=begin{cases}1,&i=j=k,\0,&text{else.}end{cases}
$$
The function $g$ is differentiable at any point $A$ where $A$ has a unique largest singular value, in which case the gradient(?) is given by
$$
nabla{g(A)}=uv^text{T},
$$
where $u$ and $v$ are the left and right singular vectors (respectively) corresponding to the (unique) largest singular value of $A$.
So I essentially have a three-dimensional object and a 2-dimensional object, and I want to apply the chain rule to get the gradient, a 1-dimensional object (i.e. a vector). A straightforward application suggests "multiplying them together" (not sure that concept is even defined), which seems like it would produce a matrix. What simple thing am I missing here?
derivatives matrix-calculus chain-rule
$endgroup$
add a comment |
$begingroup$
Consider the function $f:mathbb{R}^ntomathbb{R}$ given by $f(x)=|text{diag}(x)|$, where $text{diag}(x)inmathbb{R}^{ntimes{n}}$ is the diagonal matrix with diagonal entries $x_1,x_2,dots,x_n$, and $|cdot|$ is the spectral norm (matrix 2-norm).
Since the spectral norm of a matrix is its largest singular value, and the singular values of a (square) diagonal matrix are the absolute values of the diagonal entries, we see that $f(x)=|x|_infty$, where $|cdot|_infty$ is the (vector) sup-norm. In this form, it is easier to deduce the properties of $f$--in particular, it is differentiable at any point $xinmathbb{R}^n$ where the largest element of $x$ (in absolute value) is unique. At such a point, the gradient of $f$ is given by
$$
nabla{f(x)}=text{sgn}(x_k)e_k
$$
where $k$ is the index of the (unique) largest entry of $x$ (in absolute value), $e_k$ is the $k^text{th}$ standard basis vector in $mathbb{R}^n$, and $text{sgn}(cdot)$ is the sign function.
I want to deduce the above expression for the gradient using the chain rule applied to $f(x)=(gcirc{h})(x)$, where $g:mathbb{R}^{ntimes{n}}tomathbb{R}$ is given by $g(A)=|A|$, and $h:mathbb{R}^ntomathbb{R}^{ntimes{n}}$ is given by $h(x)=text{diag}(x)$.
The "Jacobian" of $h$ is a three-dimensional object, where
$$
frac{partial[h(x)]_{ij}}{partial{x_k}}=begin{cases}1,&i=j=k,\0,&text{else.}end{cases}
$$
The function $g$ is differentiable at any point $A$ where $A$ has a unique largest singular value, in which case the gradient(?) is given by
$$
nabla{g(A)}=uv^text{T},
$$
where $u$ and $v$ are the left and right singular vectors (respectively) corresponding to the (unique) largest singular value of $A$.
So I essentially have a three-dimensional object and a 2-dimensional object, and I want to apply the chain rule to get the gradient, a 1-dimensional object (i.e. a vector). A straightforward application suggests "multiplying them together" (not sure that concept is even defined), which seems like it would produce a matrix. What simple thing am I missing here?
derivatives matrix-calculus chain-rule
$endgroup$
Consider the function $f:mathbb{R}^ntomathbb{R}$ given by $f(x)=|text{diag}(x)|$, where $text{diag}(x)inmathbb{R}^{ntimes{n}}$ is the diagonal matrix with diagonal entries $x_1,x_2,dots,x_n$, and $|cdot|$ is the spectral norm (matrix 2-norm).
Since the spectral norm of a matrix is its largest singular value, and the singular values of a (square) diagonal matrix are the absolute values of the diagonal entries, we see that $f(x)=|x|_infty$, where $|cdot|_infty$ is the (vector) sup-norm. In this form, it is easier to deduce the properties of $f$--in particular, it is differentiable at any point $xinmathbb{R}^n$ where the largest element of $x$ (in absolute value) is unique. At such a point, the gradient of $f$ is given by
$$
nabla{f(x)}=text{sgn}(x_k)e_k
$$
where $k$ is the index of the (unique) largest entry of $x$ (in absolute value), $e_k$ is the $k^text{th}$ standard basis vector in $mathbb{R}^n$, and $text{sgn}(cdot)$ is the sign function.
I want to deduce the above expression for the gradient using the chain rule applied to $f(x)=(gcirc{h})(x)$, where $g:mathbb{R}^{ntimes{n}}tomathbb{R}$ is given by $g(A)=|A|$, and $h:mathbb{R}^ntomathbb{R}^{ntimes{n}}$ is given by $h(x)=text{diag}(x)$.
The "Jacobian" of $h$ is a three-dimensional object, where
$$
frac{partial[h(x)]_{ij}}{partial{x_k}}=begin{cases}1,&i=j=k,\0,&text{else.}end{cases}
$$
The function $g$ is differentiable at any point $A$ where $A$ has a unique largest singular value, in which case the gradient(?) is given by
$$
nabla{g(A)}=uv^text{T},
$$
where $u$ and $v$ are the left and right singular vectors (respectively) corresponding to the (unique) largest singular value of $A$.
So I essentially have a three-dimensional object and a 2-dimensional object, and I want to apply the chain rule to get the gradient, a 1-dimensional object (i.e. a vector). A straightforward application suggests "multiplying them together" (not sure that concept is even defined), which seems like it would produce a matrix. What simple thing am I missing here?
derivatives matrix-calculus chain-rule
derivatives matrix-calculus chain-rule
asked Jan 22 at 1:23
David M.David M.
1,856418
1,856418
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
A hint rather than a complete answer
It might help to think of $Bbb R^{n times n}$ as $Bbb R^{n^2}$.
Now $h$ is a function from $Bbb R^n$ to $Bbb R^{n^2}$ and $g$ is a function from
$Bbb R^{n^2}$ to $Bbb R$, so the composite goes from $Bbb R^n$ to $Bbb R$. Can you work out the $k$th partial derivative of this?
When you do, you may find out that the result you get is surprisingly closely related to matrix multiplication (of some sort) of the derivatives of $h$ and $g$...or you may not.
Just to get you started, $$nabla g(A)_{ni + j} = u_i v_j,$$
where I'm working with indices that go from $0$ to $n-1$ here.
Can you now work out the $ni + j$th entry of the $k$th partial derivative of $g circ h$?
$endgroup$
$begingroup$
Thought I might have to resort to this--was being lazy and trying to avoid "index counting" from stacking matrices into vectors. Indeed this works!
$endgroup$
– David M.
Jan 22 at 18:53
$begingroup$
And when you were done, did you discover that there was a matrix multiplication hidden in there? (I'm asking seriously...I didn't bother doing it myself, and wonder what the result was...)
$endgroup$
– John Hughes
Jan 22 at 19:24
$begingroup$
Yes it works out following the conventions of the usual multi-dimensional chain rule. The resulting gradient vector is $J_g^text{T}{J_h}$, where $J_ginmathbb{R}^{1times{n^2}}$ is the outer product of $u$ and $v$ stacked columnwise, and $J_hinmathbb{R}^{n^2times{n}}$, where each row of $J_h$ is the gradient of a component of the matrix $text{diag}(x)$ stacked columnwise (lots of words, but basically what you would expect)
$endgroup$
– David M.
Jan 22 at 19:27
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3082644%2fcompute-the-gradient-of-fx-textdiagx-with-the-chain-rule%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
A hint rather than a complete answer
It might help to think of $Bbb R^{n times n}$ as $Bbb R^{n^2}$.
Now $h$ is a function from $Bbb R^n$ to $Bbb R^{n^2}$ and $g$ is a function from
$Bbb R^{n^2}$ to $Bbb R$, so the composite goes from $Bbb R^n$ to $Bbb R$. Can you work out the $k$th partial derivative of this?
When you do, you may find out that the result you get is surprisingly closely related to matrix multiplication (of some sort) of the derivatives of $h$ and $g$...or you may not.
Just to get you started, $$nabla g(A)_{ni + j} = u_i v_j,$$
where I'm working with indices that go from $0$ to $n-1$ here.
Can you now work out the $ni + j$th entry of the $k$th partial derivative of $g circ h$?
$endgroup$
$begingroup$
Thought I might have to resort to this--was being lazy and trying to avoid "index counting" from stacking matrices into vectors. Indeed this works!
$endgroup$
– David M.
Jan 22 at 18:53
$begingroup$
And when you were done, did you discover that there was a matrix multiplication hidden in there? (I'm asking seriously...I didn't bother doing it myself, and wonder what the result was...)
$endgroup$
– John Hughes
Jan 22 at 19:24
$begingroup$
Yes it works out following the conventions of the usual multi-dimensional chain rule. The resulting gradient vector is $J_g^text{T}{J_h}$, where $J_ginmathbb{R}^{1times{n^2}}$ is the outer product of $u$ and $v$ stacked columnwise, and $J_hinmathbb{R}^{n^2times{n}}$, where each row of $J_h$ is the gradient of a component of the matrix $text{diag}(x)$ stacked columnwise (lots of words, but basically what you would expect)
$endgroup$
– David M.
Jan 22 at 19:27
add a comment |
$begingroup$
A hint rather than a complete answer
It might help to think of $Bbb R^{n times n}$ as $Bbb R^{n^2}$.
Now $h$ is a function from $Bbb R^n$ to $Bbb R^{n^2}$ and $g$ is a function from
$Bbb R^{n^2}$ to $Bbb R$, so the composite goes from $Bbb R^n$ to $Bbb R$. Can you work out the $k$th partial derivative of this?
When you do, you may find out that the result you get is surprisingly closely related to matrix multiplication (of some sort) of the derivatives of $h$ and $g$...or you may not.
Just to get you started, $$nabla g(A)_{ni + j} = u_i v_j,$$
where I'm working with indices that go from $0$ to $n-1$ here.
Can you now work out the $ni + j$th entry of the $k$th partial derivative of $g circ h$?
$endgroup$
$begingroup$
Thought I might have to resort to this--was being lazy and trying to avoid "index counting" from stacking matrices into vectors. Indeed this works!
$endgroup$
– David M.
Jan 22 at 18:53
$begingroup$
And when you were done, did you discover that there was a matrix multiplication hidden in there? (I'm asking seriously...I didn't bother doing it myself, and wonder what the result was...)
$endgroup$
– John Hughes
Jan 22 at 19:24
$begingroup$
Yes it works out following the conventions of the usual multi-dimensional chain rule. The resulting gradient vector is $J_g^text{T}{J_h}$, where $J_ginmathbb{R}^{1times{n^2}}$ is the outer product of $u$ and $v$ stacked columnwise, and $J_hinmathbb{R}^{n^2times{n}}$, where each row of $J_h$ is the gradient of a component of the matrix $text{diag}(x)$ stacked columnwise (lots of words, but basically what you would expect)
$endgroup$
– David M.
Jan 22 at 19:27
add a comment |
$begingroup$
A hint rather than a complete answer
It might help to think of $Bbb R^{n times n}$ as $Bbb R^{n^2}$.
Now $h$ is a function from $Bbb R^n$ to $Bbb R^{n^2}$ and $g$ is a function from
$Bbb R^{n^2}$ to $Bbb R$, so the composite goes from $Bbb R^n$ to $Bbb R$. Can you work out the $k$th partial derivative of this?
When you do, you may find out that the result you get is surprisingly closely related to matrix multiplication (of some sort) of the derivatives of $h$ and $g$...or you may not.
Just to get you started, $$nabla g(A)_{ni + j} = u_i v_j,$$
where I'm working with indices that go from $0$ to $n-1$ here.
Can you now work out the $ni + j$th entry of the $k$th partial derivative of $g circ h$?
$endgroup$
A hint rather than a complete answer
It might help to think of $Bbb R^{n times n}$ as $Bbb R^{n^2}$.
Now $h$ is a function from $Bbb R^n$ to $Bbb R^{n^2}$ and $g$ is a function from
$Bbb R^{n^2}$ to $Bbb R$, so the composite goes from $Bbb R^n$ to $Bbb R$. Can you work out the $k$th partial derivative of this?
When you do, you may find out that the result you get is surprisingly closely related to matrix multiplication (of some sort) of the derivatives of $h$ and $g$...or you may not.
Just to get you started, $$nabla g(A)_{ni + j} = u_i v_j,$$
where I'm working with indices that go from $0$ to $n-1$ here.
Can you now work out the $ni + j$th entry of the $k$th partial derivative of $g circ h$?
answered Jan 22 at 1:40
John HughesJohn Hughes
64.4k24191
64.4k24191
$begingroup$
Thought I might have to resort to this--was being lazy and trying to avoid "index counting" from stacking matrices into vectors. Indeed this works!
$endgroup$
– David M.
Jan 22 at 18:53
$begingroup$
And when you were done, did you discover that there was a matrix multiplication hidden in there? (I'm asking seriously...I didn't bother doing it myself, and wonder what the result was...)
$endgroup$
– John Hughes
Jan 22 at 19:24
$begingroup$
Yes it works out following the conventions of the usual multi-dimensional chain rule. The resulting gradient vector is $J_g^text{T}{J_h}$, where $J_ginmathbb{R}^{1times{n^2}}$ is the outer product of $u$ and $v$ stacked columnwise, and $J_hinmathbb{R}^{n^2times{n}}$, where each row of $J_h$ is the gradient of a component of the matrix $text{diag}(x)$ stacked columnwise (lots of words, but basically what you would expect)
$endgroup$
– David M.
Jan 22 at 19:27
add a comment |
$begingroup$
Thought I might have to resort to this--was being lazy and trying to avoid "index counting" from stacking matrices into vectors. Indeed this works!
$endgroup$
– David M.
Jan 22 at 18:53
$begingroup$
And when you were done, did you discover that there was a matrix multiplication hidden in there? (I'm asking seriously...I didn't bother doing it myself, and wonder what the result was...)
$endgroup$
– John Hughes
Jan 22 at 19:24
$begingroup$
Yes it works out following the conventions of the usual multi-dimensional chain rule. The resulting gradient vector is $J_g^text{T}{J_h}$, where $J_ginmathbb{R}^{1times{n^2}}$ is the outer product of $u$ and $v$ stacked columnwise, and $J_hinmathbb{R}^{n^2times{n}}$, where each row of $J_h$ is the gradient of a component of the matrix $text{diag}(x)$ stacked columnwise (lots of words, but basically what you would expect)
$endgroup$
– David M.
Jan 22 at 19:27
$begingroup$
Thought I might have to resort to this--was being lazy and trying to avoid "index counting" from stacking matrices into vectors. Indeed this works!
$endgroup$
– David M.
Jan 22 at 18:53
$begingroup$
Thought I might have to resort to this--was being lazy and trying to avoid "index counting" from stacking matrices into vectors. Indeed this works!
$endgroup$
– David M.
Jan 22 at 18:53
$begingroup$
And when you were done, did you discover that there was a matrix multiplication hidden in there? (I'm asking seriously...I didn't bother doing it myself, and wonder what the result was...)
$endgroup$
– John Hughes
Jan 22 at 19:24
$begingroup$
And when you were done, did you discover that there was a matrix multiplication hidden in there? (I'm asking seriously...I didn't bother doing it myself, and wonder what the result was...)
$endgroup$
– John Hughes
Jan 22 at 19:24
$begingroup$
Yes it works out following the conventions of the usual multi-dimensional chain rule. The resulting gradient vector is $J_g^text{T}{J_h}$, where $J_ginmathbb{R}^{1times{n^2}}$ is the outer product of $u$ and $v$ stacked columnwise, and $J_hinmathbb{R}^{n^2times{n}}$, where each row of $J_h$ is the gradient of a component of the matrix $text{diag}(x)$ stacked columnwise (lots of words, but basically what you would expect)
$endgroup$
– David M.
Jan 22 at 19:27
$begingroup$
Yes it works out following the conventions of the usual multi-dimensional chain rule. The resulting gradient vector is $J_g^text{T}{J_h}$, where $J_ginmathbb{R}^{1times{n^2}}$ is the outer product of $u$ and $v$ stacked columnwise, and $J_hinmathbb{R}^{n^2times{n}}$, where each row of $J_h$ is the gradient of a component of the matrix $text{diag}(x)$ stacked columnwise (lots of words, but basically what you would expect)
$endgroup$
– David M.
Jan 22 at 19:27
add a comment |
Thanks for contributing an answer to Mathematics Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3082644%2fcompute-the-gradient-of-fx-textdiagx-with-the-chain-rule%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown