Proximal gradient method justification
$begingroup$
If $f$ and $g$ are respectively a differentiable function and a convex, lower semi-continuous function, then the algorithm defined by:
$$ x^{k+1} = text{prox}_{gamma{g}}[x^{k} - gammanabla{f(x^{k}})]$$
converges to $text{argmin}[f+g]$.
This is justified by the fact that if $x^{*}$ is a minimizer of $f+g$, then:
$$ x^{*} = text{prox}_{gamma{g}}[x^{*} - gammanabla{f(x^{*}})]$$
But I do not understand this relation. Why is it true?
That is, why $x^{*} = text{argmin}[f+g] Leftrightarrow x^{*} =text{prox}_{gamma{g}}[x^{*} - gamma nabla{f(x^{*}})] $ ?
optimization convex-analysis gradient-descent semicontinuous-functions
$endgroup$
add a comment |
$begingroup$
If $f$ and $g$ are respectively a differentiable function and a convex, lower semi-continuous function, then the algorithm defined by:
$$ x^{k+1} = text{prox}_{gamma{g}}[x^{k} - gammanabla{f(x^{k}})]$$
converges to $text{argmin}[f+g]$.
This is justified by the fact that if $x^{*}$ is a minimizer of $f+g$, then:
$$ x^{*} = text{prox}_{gamma{g}}[x^{*} - gammanabla{f(x^{*}})]$$
But I do not understand this relation. Why is it true?
That is, why $x^{*} = text{argmin}[f+g] Leftrightarrow x^{*} =text{prox}_{gamma{g}}[x^{*} - gamma nabla{f(x^{*}})] $ ?
optimization convex-analysis gradient-descent semicontinuous-functions
$endgroup$
add a comment |
$begingroup$
If $f$ and $g$ are respectively a differentiable function and a convex, lower semi-continuous function, then the algorithm defined by:
$$ x^{k+1} = text{prox}_{gamma{g}}[x^{k} - gammanabla{f(x^{k}})]$$
converges to $text{argmin}[f+g]$.
This is justified by the fact that if $x^{*}$ is a minimizer of $f+g$, then:
$$ x^{*} = text{prox}_{gamma{g}}[x^{*} - gammanabla{f(x^{*}})]$$
But I do not understand this relation. Why is it true?
That is, why $x^{*} = text{argmin}[f+g] Leftrightarrow x^{*} =text{prox}_{gamma{g}}[x^{*} - gamma nabla{f(x^{*}})] $ ?
optimization convex-analysis gradient-descent semicontinuous-functions
$endgroup$
If $f$ and $g$ are respectively a differentiable function and a convex, lower semi-continuous function, then the algorithm defined by:
$$ x^{k+1} = text{prox}_{gamma{g}}[x^{k} - gammanabla{f(x^{k}})]$$
converges to $text{argmin}[f+g]$.
This is justified by the fact that if $x^{*}$ is a minimizer of $f+g$, then:
$$ x^{*} = text{prox}_{gamma{g}}[x^{*} - gammanabla{f(x^{*}})]$$
But I do not understand this relation. Why is it true?
That is, why $x^{*} = text{argmin}[f+g] Leftrightarrow x^{*} =text{prox}_{gamma{g}}[x^{*} - gamma nabla{f(x^{*}})] $ ?
optimization convex-analysis gradient-descent semicontinuous-functions
optimization convex-analysis gradient-descent semicontinuous-functions
edited Jan 17 at 16:48
user52705
14810
14810
asked Jan 17 at 14:24
AlbertoAlberto
82
82
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
I will show below that if $x^* = text{prox}_{gamma{g}}[x^{*} - gamma nabla{f(x^{*}})]$ then $x^* in text{argmin } f(x)+g(x) $.
Plugging in the definition of proximal operator, we have
$$text{prox}_{gamma{g}}[x^{*} - gamma nabla{f(x^{*}})] = text{argmin}_{x} left{gamma g(x) + frac{1}{2}|x- (x^* - gamma nabla f(x^*))|^2right}$$
Now since $x^* = text{prox}_{gamma{g}}[x^{*} - gamma nabla{f(x^{*}})]$, we have by Fermat's rule at $x=x^*$, the following
$$0 in partial (gamma g(x)) + (x-(x^*-gammanabla f(x^*)))$$
Now just substitute $x=x^*$, we get
$$0in gamma partial g(x^*) + gamma nabla f(x^*)$$
This is equivalent to saying that $0 in partial F(x^*)$, where $F(x) = g(x)+f(x)$, so $x^*$ is the minimizer. The other direction of the proof is very similar.
Note: The last step $0 in partial F(x^*)$ need not always hold (check subdifferential properties).
$endgroup$
add a comment |
$begingroup$
Here's an explanation which assumes that we already understand the idea that the prox-operator of $g$ with parameter $t > 0$ is the operator $(I + t partial g)^{-1}$, where $partial g$ is the subdifferential of $g$.
I'll assume that $f$ is convex as well as differentiable, and that $g$ is convex and closed. Let $t >0$. A point $x$ is a minimizer of $f + g$ if and only if
begin{align}
&0 in nabla f(x) + partial g(x) \
iff &x in x + t nabla f(x) + tpartial g(x) \
iff &x - t nabla f(x) in (I + t partial g)(x) \
iff &x = (I + t partial g)^{-1}(x - t nabla f(x)).
end{align}
The final equation is another way of saying that
$$
x = text{prox}_{tg}(x - t nabla f(x)).
$$
We can then solve this equation using fixed point iteration, which yields the proximal gradient method.
By the way, if this derivation of the proximal gradient method doesn't seem intuitive, there are other ways to discover the proximal gradient method that are more obvious. The viewpoint given here has the advantage that it shows that the proximal gradient method is a fixed point iteration, which helps us with convergence proofs.
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3077039%2fproximal-gradient-method-justification%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
I will show below that if $x^* = text{prox}_{gamma{g}}[x^{*} - gamma nabla{f(x^{*}})]$ then $x^* in text{argmin } f(x)+g(x) $.
Plugging in the definition of proximal operator, we have
$$text{prox}_{gamma{g}}[x^{*} - gamma nabla{f(x^{*}})] = text{argmin}_{x} left{gamma g(x) + frac{1}{2}|x- (x^* - gamma nabla f(x^*))|^2right}$$
Now since $x^* = text{prox}_{gamma{g}}[x^{*} - gamma nabla{f(x^{*}})]$, we have by Fermat's rule at $x=x^*$, the following
$$0 in partial (gamma g(x)) + (x-(x^*-gammanabla f(x^*)))$$
Now just substitute $x=x^*$, we get
$$0in gamma partial g(x^*) + gamma nabla f(x^*)$$
This is equivalent to saying that $0 in partial F(x^*)$, where $F(x) = g(x)+f(x)$, so $x^*$ is the minimizer. The other direction of the proof is very similar.
Note: The last step $0 in partial F(x^*)$ need not always hold (check subdifferential properties).
$endgroup$
add a comment |
$begingroup$
I will show below that if $x^* = text{prox}_{gamma{g}}[x^{*} - gamma nabla{f(x^{*}})]$ then $x^* in text{argmin } f(x)+g(x) $.
Plugging in the definition of proximal operator, we have
$$text{prox}_{gamma{g}}[x^{*} - gamma nabla{f(x^{*}})] = text{argmin}_{x} left{gamma g(x) + frac{1}{2}|x- (x^* - gamma nabla f(x^*))|^2right}$$
Now since $x^* = text{prox}_{gamma{g}}[x^{*} - gamma nabla{f(x^{*}})]$, we have by Fermat's rule at $x=x^*$, the following
$$0 in partial (gamma g(x)) + (x-(x^*-gammanabla f(x^*)))$$
Now just substitute $x=x^*$, we get
$$0in gamma partial g(x^*) + gamma nabla f(x^*)$$
This is equivalent to saying that $0 in partial F(x^*)$, where $F(x) = g(x)+f(x)$, so $x^*$ is the minimizer. The other direction of the proof is very similar.
Note: The last step $0 in partial F(x^*)$ need not always hold (check subdifferential properties).
$endgroup$
add a comment |
$begingroup$
I will show below that if $x^* = text{prox}_{gamma{g}}[x^{*} - gamma nabla{f(x^{*}})]$ then $x^* in text{argmin } f(x)+g(x) $.
Plugging in the definition of proximal operator, we have
$$text{prox}_{gamma{g}}[x^{*} - gamma nabla{f(x^{*}})] = text{argmin}_{x} left{gamma g(x) + frac{1}{2}|x- (x^* - gamma nabla f(x^*))|^2right}$$
Now since $x^* = text{prox}_{gamma{g}}[x^{*} - gamma nabla{f(x^{*}})]$, we have by Fermat's rule at $x=x^*$, the following
$$0 in partial (gamma g(x)) + (x-(x^*-gammanabla f(x^*)))$$
Now just substitute $x=x^*$, we get
$$0in gamma partial g(x^*) + gamma nabla f(x^*)$$
This is equivalent to saying that $0 in partial F(x^*)$, where $F(x) = g(x)+f(x)$, so $x^*$ is the minimizer. The other direction of the proof is very similar.
Note: The last step $0 in partial F(x^*)$ need not always hold (check subdifferential properties).
$endgroup$
I will show below that if $x^* = text{prox}_{gamma{g}}[x^{*} - gamma nabla{f(x^{*}})]$ then $x^* in text{argmin } f(x)+g(x) $.
Plugging in the definition of proximal operator, we have
$$text{prox}_{gamma{g}}[x^{*} - gamma nabla{f(x^{*}})] = text{argmin}_{x} left{gamma g(x) + frac{1}{2}|x- (x^* - gamma nabla f(x^*))|^2right}$$
Now since $x^* = text{prox}_{gamma{g}}[x^{*} - gamma nabla{f(x^{*}})]$, we have by Fermat's rule at $x=x^*$, the following
$$0 in partial (gamma g(x)) + (x-(x^*-gammanabla f(x^*)))$$
Now just substitute $x=x^*$, we get
$$0in gamma partial g(x^*) + gamma nabla f(x^*)$$
This is equivalent to saying that $0 in partial F(x^*)$, where $F(x) = g(x)+f(x)$, so $x^*$ is the minimizer. The other direction of the proof is very similar.
Note: The last step $0 in partial F(x^*)$ need not always hold (check subdifferential properties).
edited Jan 18 at 16:16
answered Jan 17 at 16:03
user52705user52705
14810
14810
add a comment |
add a comment |
$begingroup$
Here's an explanation which assumes that we already understand the idea that the prox-operator of $g$ with parameter $t > 0$ is the operator $(I + t partial g)^{-1}$, where $partial g$ is the subdifferential of $g$.
I'll assume that $f$ is convex as well as differentiable, and that $g$ is convex and closed. Let $t >0$. A point $x$ is a minimizer of $f + g$ if and only if
begin{align}
&0 in nabla f(x) + partial g(x) \
iff &x in x + t nabla f(x) + tpartial g(x) \
iff &x - t nabla f(x) in (I + t partial g)(x) \
iff &x = (I + t partial g)^{-1}(x - t nabla f(x)).
end{align}
The final equation is another way of saying that
$$
x = text{prox}_{tg}(x - t nabla f(x)).
$$
We can then solve this equation using fixed point iteration, which yields the proximal gradient method.
By the way, if this derivation of the proximal gradient method doesn't seem intuitive, there are other ways to discover the proximal gradient method that are more obvious. The viewpoint given here has the advantage that it shows that the proximal gradient method is a fixed point iteration, which helps us with convergence proofs.
$endgroup$
add a comment |
$begingroup$
Here's an explanation which assumes that we already understand the idea that the prox-operator of $g$ with parameter $t > 0$ is the operator $(I + t partial g)^{-1}$, where $partial g$ is the subdifferential of $g$.
I'll assume that $f$ is convex as well as differentiable, and that $g$ is convex and closed. Let $t >0$. A point $x$ is a minimizer of $f + g$ if and only if
begin{align}
&0 in nabla f(x) + partial g(x) \
iff &x in x + t nabla f(x) + tpartial g(x) \
iff &x - t nabla f(x) in (I + t partial g)(x) \
iff &x = (I + t partial g)^{-1}(x - t nabla f(x)).
end{align}
The final equation is another way of saying that
$$
x = text{prox}_{tg}(x - t nabla f(x)).
$$
We can then solve this equation using fixed point iteration, which yields the proximal gradient method.
By the way, if this derivation of the proximal gradient method doesn't seem intuitive, there are other ways to discover the proximal gradient method that are more obvious. The viewpoint given here has the advantage that it shows that the proximal gradient method is a fixed point iteration, which helps us with convergence proofs.
$endgroup$
add a comment |
$begingroup$
Here's an explanation which assumes that we already understand the idea that the prox-operator of $g$ with parameter $t > 0$ is the operator $(I + t partial g)^{-1}$, where $partial g$ is the subdifferential of $g$.
I'll assume that $f$ is convex as well as differentiable, and that $g$ is convex and closed. Let $t >0$. A point $x$ is a minimizer of $f + g$ if and only if
begin{align}
&0 in nabla f(x) + partial g(x) \
iff &x in x + t nabla f(x) + tpartial g(x) \
iff &x - t nabla f(x) in (I + t partial g)(x) \
iff &x = (I + t partial g)^{-1}(x - t nabla f(x)).
end{align}
The final equation is another way of saying that
$$
x = text{prox}_{tg}(x - t nabla f(x)).
$$
We can then solve this equation using fixed point iteration, which yields the proximal gradient method.
By the way, if this derivation of the proximal gradient method doesn't seem intuitive, there are other ways to discover the proximal gradient method that are more obvious. The viewpoint given here has the advantage that it shows that the proximal gradient method is a fixed point iteration, which helps us with convergence proofs.
$endgroup$
Here's an explanation which assumes that we already understand the idea that the prox-operator of $g$ with parameter $t > 0$ is the operator $(I + t partial g)^{-1}$, where $partial g$ is the subdifferential of $g$.
I'll assume that $f$ is convex as well as differentiable, and that $g$ is convex and closed. Let $t >0$. A point $x$ is a minimizer of $f + g$ if and only if
begin{align}
&0 in nabla f(x) + partial g(x) \
iff &x in x + t nabla f(x) + tpartial g(x) \
iff &x - t nabla f(x) in (I + t partial g)(x) \
iff &x = (I + t partial g)^{-1}(x - t nabla f(x)).
end{align}
The final equation is another way of saying that
$$
x = text{prox}_{tg}(x - t nabla f(x)).
$$
We can then solve this equation using fixed point iteration, which yields the proximal gradient method.
By the way, if this derivation of the proximal gradient method doesn't seem intuitive, there are other ways to discover the proximal gradient method that are more obvious. The viewpoint given here has the advantage that it shows that the proximal gradient method is a fixed point iteration, which helps us with convergence proofs.
answered Jan 17 at 17:31
littleOlittleO
29.9k646109
29.9k646109
add a comment |
add a comment |
Thanks for contributing an answer to Mathematics Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3077039%2fproximal-gradient-method-justification%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown