Firm non-expansiveness in the context of proximal operators
$begingroup$
$newcommand{prox}{operatorname{prox}}$
Probably the most remarkable property of the proximal operator is the fixed point property:
The point $x^*$ minimizes $f$ if and only if $x^* = prox_f(x^*) $
So, indeed, $f$ can be minimized by find a fixed point of its proximal operator.
http://web.stanford.edu/~boyd/papers/pdf/prox_algs.pdf
Question 1. In the paper given above, author is saying:
If $prox_f$ were a contraction, i.e., Lipschitz continuous with
constant less than $1$, repeatedly applying $prox_f$ would find a (here,
unique) fixed point
Why the bound on the first-order derivative guarantees finding a fixed point by repeatedly applying proximal operator?
Question 2. Let me quote a paragraph from the same paper:
It turns out that while $prox_f$ need not be a contraction (unless $f$ is strongly convex), it does have a different property, firm nonexpansiveness, sufficient for fixed point iteration:
$|prox_f(x) - prox_f(y)|^2_2 le (x-y)^T (prox_f(x) - prox_f(y))$
$forall x,y in mathbb{R}^n$
Firmly nonexpansive operators are special cases of nonexpansive operators (those that are Lipschitz continuous with constant 1). Iteration of a general nonexpansive operator need not converge to a fixed point: consider operators like $-I$ or rotations. However, it tunrs out that if $N$ is nonexpansive, then the operator $T = (1-alpha)I + alpha N$, where $alpha in (0,1)$ has the same fixed points as $N$ and simple iteration of $T$ will converge to a fixed point of $T$ (and thus of $N$), i.e. the sequence:
$x^{k+1} := (1-alpha)x^k +alpha N(x^k)$
will converge to a fixed point of $N$. Put differently, damped iteration of
a nonexpansive operator will converge to one of its fixed points.
Operators in the form $(1-alpha)I + alpha N$, where $N$ is non-expansive and $alpha in (0,1)$, are called $alpha$-averaged operators.
Firmly nonexpansive operators are averaged: indeed, they are precisely the (1/2)-averaged
operators.
Why "unless $f$ is strongly convex"?
What is the intuition behind the given expression for firm nonexpansiveness?
How can you show that firm nonexpansive operators are $alpha$-averaged with $alpha = frac{1}{2}$?
Is anyone aware of the proof of why proximal map is firm nonexpansive?
multivariable-calculus operator-theory convex-analysis convex-optimization fixed-point-theorems
$endgroup$
add a comment |
$begingroup$
$newcommand{prox}{operatorname{prox}}$
Probably the most remarkable property of the proximal operator is the fixed point property:
The point $x^*$ minimizes $f$ if and only if $x^* = prox_f(x^*) $
So, indeed, $f$ can be minimized by find a fixed point of its proximal operator.
http://web.stanford.edu/~boyd/papers/pdf/prox_algs.pdf
Question 1. In the paper given above, author is saying:
If $prox_f$ were a contraction, i.e., Lipschitz continuous with
constant less than $1$, repeatedly applying $prox_f$ would find a (here,
unique) fixed point
Why the bound on the first-order derivative guarantees finding a fixed point by repeatedly applying proximal operator?
Question 2. Let me quote a paragraph from the same paper:
It turns out that while $prox_f$ need not be a contraction (unless $f$ is strongly convex), it does have a different property, firm nonexpansiveness, sufficient for fixed point iteration:
$|prox_f(x) - prox_f(y)|^2_2 le (x-y)^T (prox_f(x) - prox_f(y))$
$forall x,y in mathbb{R}^n$
Firmly nonexpansive operators are special cases of nonexpansive operators (those that are Lipschitz continuous with constant 1). Iteration of a general nonexpansive operator need not converge to a fixed point: consider operators like $-I$ or rotations. However, it tunrs out that if $N$ is nonexpansive, then the operator $T = (1-alpha)I + alpha N$, where $alpha in (0,1)$ has the same fixed points as $N$ and simple iteration of $T$ will converge to a fixed point of $T$ (and thus of $N$), i.e. the sequence:
$x^{k+1} := (1-alpha)x^k +alpha N(x^k)$
will converge to a fixed point of $N$. Put differently, damped iteration of
a nonexpansive operator will converge to one of its fixed points.
Operators in the form $(1-alpha)I + alpha N$, where $N$ is non-expansive and $alpha in (0,1)$, are called $alpha$-averaged operators.
Firmly nonexpansive operators are averaged: indeed, they are precisely the (1/2)-averaged
operators.
Why "unless $f$ is strongly convex"?
What is the intuition behind the given expression for firm nonexpansiveness?
How can you show that firm nonexpansive operators are $alpha$-averaged with $alpha = frac{1}{2}$?
Is anyone aware of the proof of why proximal map is firm nonexpansive?
multivariable-calculus operator-theory convex-analysis convex-optimization fixed-point-theorems
$endgroup$
add a comment |
$begingroup$
$newcommand{prox}{operatorname{prox}}$
Probably the most remarkable property of the proximal operator is the fixed point property:
The point $x^*$ minimizes $f$ if and only if $x^* = prox_f(x^*) $
So, indeed, $f$ can be minimized by find a fixed point of its proximal operator.
http://web.stanford.edu/~boyd/papers/pdf/prox_algs.pdf
Question 1. In the paper given above, author is saying:
If $prox_f$ were a contraction, i.e., Lipschitz continuous with
constant less than $1$, repeatedly applying $prox_f$ would find a (here,
unique) fixed point
Why the bound on the first-order derivative guarantees finding a fixed point by repeatedly applying proximal operator?
Question 2. Let me quote a paragraph from the same paper:
It turns out that while $prox_f$ need not be a contraction (unless $f$ is strongly convex), it does have a different property, firm nonexpansiveness, sufficient for fixed point iteration:
$|prox_f(x) - prox_f(y)|^2_2 le (x-y)^T (prox_f(x) - prox_f(y))$
$forall x,y in mathbb{R}^n$
Firmly nonexpansive operators are special cases of nonexpansive operators (those that are Lipschitz continuous with constant 1). Iteration of a general nonexpansive operator need not converge to a fixed point: consider operators like $-I$ or rotations. However, it tunrs out that if $N$ is nonexpansive, then the operator $T = (1-alpha)I + alpha N$, where $alpha in (0,1)$ has the same fixed points as $N$ and simple iteration of $T$ will converge to a fixed point of $T$ (and thus of $N$), i.e. the sequence:
$x^{k+1} := (1-alpha)x^k +alpha N(x^k)$
will converge to a fixed point of $N$. Put differently, damped iteration of
a nonexpansive operator will converge to one of its fixed points.
Operators in the form $(1-alpha)I + alpha N$, where $N$ is non-expansive and $alpha in (0,1)$, are called $alpha$-averaged operators.
Firmly nonexpansive operators are averaged: indeed, they are precisely the (1/2)-averaged
operators.
Why "unless $f$ is strongly convex"?
What is the intuition behind the given expression for firm nonexpansiveness?
How can you show that firm nonexpansive operators are $alpha$-averaged with $alpha = frac{1}{2}$?
Is anyone aware of the proof of why proximal map is firm nonexpansive?
multivariable-calculus operator-theory convex-analysis convex-optimization fixed-point-theorems
$endgroup$
$newcommand{prox}{operatorname{prox}}$
Probably the most remarkable property of the proximal operator is the fixed point property:
The point $x^*$ minimizes $f$ if and only if $x^* = prox_f(x^*) $
So, indeed, $f$ can be minimized by find a fixed point of its proximal operator.
http://web.stanford.edu/~boyd/papers/pdf/prox_algs.pdf
Question 1. In the paper given above, author is saying:
If $prox_f$ were a contraction, i.e., Lipschitz continuous with
constant less than $1$, repeatedly applying $prox_f$ would find a (here,
unique) fixed point
Why the bound on the first-order derivative guarantees finding a fixed point by repeatedly applying proximal operator?
Question 2. Let me quote a paragraph from the same paper:
It turns out that while $prox_f$ need not be a contraction (unless $f$ is strongly convex), it does have a different property, firm nonexpansiveness, sufficient for fixed point iteration:
$|prox_f(x) - prox_f(y)|^2_2 le (x-y)^T (prox_f(x) - prox_f(y))$
$forall x,y in mathbb{R}^n$
Firmly nonexpansive operators are special cases of nonexpansive operators (those that are Lipschitz continuous with constant 1). Iteration of a general nonexpansive operator need not converge to a fixed point: consider operators like $-I$ or rotations. However, it tunrs out that if $N$ is nonexpansive, then the operator $T = (1-alpha)I + alpha N$, where $alpha in (0,1)$ has the same fixed points as $N$ and simple iteration of $T$ will converge to a fixed point of $T$ (and thus of $N$), i.e. the sequence:
$x^{k+1} := (1-alpha)x^k +alpha N(x^k)$
will converge to a fixed point of $N$. Put differently, damped iteration of
a nonexpansive operator will converge to one of its fixed points.
Operators in the form $(1-alpha)I + alpha N$, where $N$ is non-expansive and $alpha in (0,1)$, are called $alpha$-averaged operators.
Firmly nonexpansive operators are averaged: indeed, they are precisely the (1/2)-averaged
operators.
Why "unless $f$ is strongly convex"?
What is the intuition behind the given expression for firm nonexpansiveness?
How can you show that firm nonexpansive operators are $alpha$-averaged with $alpha = frac{1}{2}$?
Is anyone aware of the proof of why proximal map is firm nonexpansive?
multivariable-calculus operator-theory convex-analysis convex-optimization fixed-point-theorems
multivariable-calculus operator-theory convex-analysis convex-optimization fixed-point-theorems
asked Aug 27 '14 at 11:23
trembiktrembik
4441518
4441518
add a comment |
add a comment |
3 Answers
3
active
oldest
votes
$begingroup$
Here's a short proof that proximal operators are firmly nonexpansive. Let $f:mathbb R^n to mathbb R cup {infty}$ be closed convex proper (CCP). Assume that
begin{equation}
u_1 = operatorname{prox}_f(x_1) quad text{and} quad u_2 = operatorname{prox}_f(x_2).
end{equation}
Equivalently,
begin{equation}
tag{$spadesuit$} x_1 - u_1 in partial f(u_1) quad text{and} quad x_2 - u_2 in partial f(u_2).
end{equation}
Now we use the (fundamentally important) fact that $partial f$ is a monotone mapping, which tells us that
begin{align}
&langle x_1 - u_1 - (x_2 - u_2), u_1 - u_2 rangle geq 0 \
tag{$heartsuit$}implies & langle x_1 - x_2, u_1 - u_2 rangle geq | u_1 - u_2 |_2^2.
end{align}
This shows that $operatorname{prox}_f$ is firmly nonexpansive.
Comments:
- When working with prox-operators, often the very first step is to express $u = operatorname{prox}_f(x)$ in the equivalent form $x - u in partial f(u)$. This is a statement you can work with, involving more primitive notions.
- The fact that $partial f$ is monotone is so fundamental, that it is extremely tempting to invoke monotonicity once we have written down equation $(spadesuit)$. In fact, much of the theory of prox-operators can be generalized to the setting of "monotone operators" which are set-valued mappings (such as $partial f$) which satisfy the monotonicity property. From this viewpoint, the most important fact about $partial f$ is that it is monotone.
- The fact that this derivation is so short, essentially only two lines, helps to explain how the "firmly nonexpansive" property might be discovered in the first place. A mathematician playing around with these definitions might stumble across the inequality $(heartsuit)$ rather quickly. A mathematician would notice that inequality $(heartsuit)$ implies that $operatorname{prox}_f$ is nonexpansive (because $langle x_1 - x_2, u_1 - u_2 rangle leq |x_1 - x_2|_1 | u_1 - u_2 |_2$), but it may seem wasteful to replace the inequality $(heartsuit)$ with the inequality $|u_1 - u_2 |_2 leq |x_1 - x_2 |_2$, because this latter inequality is less tight.
$endgroup$
add a comment |
$begingroup$
You can proof the non-expansiveness of the proximal operator as following:
$DeclareMathOperator{prox}{prox}$
Note that the proof is similar to the proof of the (firm) non-expansiveness of the projection operator [1].
You can imagine the the proximal operator as some kind of generalised projection operator.
Proof:
We pick two arbitrary points $z_1$, $z_2$.
begin{align}
(z_1 - prox_f(z_1))^top (x- prox_f(z_1)) &le 0 quad forall x in mathcal C \
end{align}
By definition of the proximal operator $prox_f(z_2)$ is in $mathcal C$.
The optimality condition for $prox_f(z_1)$ is:
begin{align}
(f_{prox_f(z_1)} + prox_f(z_1) - z_1)^top (prox_f (z_2) - prox_f (z_1)) &ge 0 label{eq:opt_cond_1}\
end{align}
and for $prox_f(z_2)$:
begin{align}
(f_{prox_f(z_2)} + prox_f(z_2) - z_2)^top (prox_f (z_1) - prox_f (z_2)) &ge 0 \
(z_2 - f_{prox_f(z_2)} - prox_f(z_2) - )^top (prox_f (z_2) - prox_f (z_1)) &ge 0 label{eq:opt_cond_2}
end{align}
The subgradient properties for $f_{prox_f(z_1)}$ and $f_{prox_f (z_2)}$ are:
begin{align}
f(prox_f(z_2)) - f(prox_f(z_1)) ge f_{prox_f(z_2)}^top (prox_f(z_2) - prox_f(z_1)) label{eq:subgrad_1}\
f(prox_f(z_1)) - f(prox_f(z_2)) ge f_{prox_f(z_1)}^top (prox_f(z_1) - prox_f(z_2)) label{eq:subgrad_2}\
end{align}
Adding the two subgradient properties results to:
begin{align}
0 ge (f_{prox_f(z_2)} - f_{prox_f(z_1)})^top (prox_f(z_2) - prox_f(z_1)) label{eq:subgrad_zero}
end{align}
Adding the two optimality conditions leads us to:
begin{align}
((z_2 - z_1) + (f_{prox_f(z_1)} - f_{prox_f(z_2)}) + (prox_f(z_1) - prox_f(z_2)))^top (prox_f(z_2) - prox_f(z_1)) &ge 0 \
(- (z_2 - z_1) + (f_{prox_f(z_2)} - f_{prox_f(z_1)}) + (prox_f(z_2) - prox_f(z_1)))^top (prox_f(z_2) - prox_f(z_1)) &le 0
end{align}
We use that the subgradient properties to bound the subgradients by zero:
begin{align}
(prox_f(z_2) - prox_f(z_1))^top (prox_f(z_2) - prox_f(z_1)) &le ((z_2 - z_1) - (f_{prox_f(z_2)} - f_{prox_f(z_1)}))^top (prox_f(z_2) - prox_f(z_1)) \
(prox_f(z_2) - prox_f(z_1))^toprox_f (prox_f(z_2) - prox_f(z_1)) &le (z_2 - z_1)^top (prox_f(z_2) - prox_f(z_1))
end{align}
And use Cauchy-Schwarz:
begin{align}
||prox_f(z_2) - prox_f(z_1)||^2 &le ||z_2 - z_1|| ||prox_f(z_2) - prox_f(z_1)|| \
||prox_f(z_2) - prox_f(z_1)|| &le ||z_2 - z_1|| \
&&square
end{align}
$endgroup$
1
$begingroup$
Please use 'operatorname{prox}'
$endgroup$
– Silvia Ghinassi
Aug 22 '16 at 14:59
$begingroup$
Okay, thanks for the remark!
$endgroup$
– martinzellner
Aug 24 '16 at 9:52
add a comment |
$begingroup$
I can explain the first question, the author said "if $prox_f$ were a contraction", which is not guaranteed. Banach fixed-point theorem tells us repeatedly applying contraction would find a (here, unique) fixed point. So if $prox_f$ is a contraction, then by repeatedly applying proximal operator will result a fixed point.
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f910803%2ffirm-non-expansiveness-in-the-context-of-proximal-operators%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Here's a short proof that proximal operators are firmly nonexpansive. Let $f:mathbb R^n to mathbb R cup {infty}$ be closed convex proper (CCP). Assume that
begin{equation}
u_1 = operatorname{prox}_f(x_1) quad text{and} quad u_2 = operatorname{prox}_f(x_2).
end{equation}
Equivalently,
begin{equation}
tag{$spadesuit$} x_1 - u_1 in partial f(u_1) quad text{and} quad x_2 - u_2 in partial f(u_2).
end{equation}
Now we use the (fundamentally important) fact that $partial f$ is a monotone mapping, which tells us that
begin{align}
&langle x_1 - u_1 - (x_2 - u_2), u_1 - u_2 rangle geq 0 \
tag{$heartsuit$}implies & langle x_1 - x_2, u_1 - u_2 rangle geq | u_1 - u_2 |_2^2.
end{align}
This shows that $operatorname{prox}_f$ is firmly nonexpansive.
Comments:
- When working with prox-operators, often the very first step is to express $u = operatorname{prox}_f(x)$ in the equivalent form $x - u in partial f(u)$. This is a statement you can work with, involving more primitive notions.
- The fact that $partial f$ is monotone is so fundamental, that it is extremely tempting to invoke monotonicity once we have written down equation $(spadesuit)$. In fact, much of the theory of prox-operators can be generalized to the setting of "monotone operators" which are set-valued mappings (such as $partial f$) which satisfy the monotonicity property. From this viewpoint, the most important fact about $partial f$ is that it is monotone.
- The fact that this derivation is so short, essentially only two lines, helps to explain how the "firmly nonexpansive" property might be discovered in the first place. A mathematician playing around with these definitions might stumble across the inequality $(heartsuit)$ rather quickly. A mathematician would notice that inequality $(heartsuit)$ implies that $operatorname{prox}_f$ is nonexpansive (because $langle x_1 - x_2, u_1 - u_2 rangle leq |x_1 - x_2|_1 | u_1 - u_2 |_2$), but it may seem wasteful to replace the inequality $(heartsuit)$ with the inequality $|u_1 - u_2 |_2 leq |x_1 - x_2 |_2$, because this latter inequality is less tight.
$endgroup$
add a comment |
$begingroup$
Here's a short proof that proximal operators are firmly nonexpansive. Let $f:mathbb R^n to mathbb R cup {infty}$ be closed convex proper (CCP). Assume that
begin{equation}
u_1 = operatorname{prox}_f(x_1) quad text{and} quad u_2 = operatorname{prox}_f(x_2).
end{equation}
Equivalently,
begin{equation}
tag{$spadesuit$} x_1 - u_1 in partial f(u_1) quad text{and} quad x_2 - u_2 in partial f(u_2).
end{equation}
Now we use the (fundamentally important) fact that $partial f$ is a monotone mapping, which tells us that
begin{align}
&langle x_1 - u_1 - (x_2 - u_2), u_1 - u_2 rangle geq 0 \
tag{$heartsuit$}implies & langle x_1 - x_2, u_1 - u_2 rangle geq | u_1 - u_2 |_2^2.
end{align}
This shows that $operatorname{prox}_f$ is firmly nonexpansive.
Comments:
- When working with prox-operators, often the very first step is to express $u = operatorname{prox}_f(x)$ in the equivalent form $x - u in partial f(u)$. This is a statement you can work with, involving more primitive notions.
- The fact that $partial f$ is monotone is so fundamental, that it is extremely tempting to invoke monotonicity once we have written down equation $(spadesuit)$. In fact, much of the theory of prox-operators can be generalized to the setting of "monotone operators" which are set-valued mappings (such as $partial f$) which satisfy the monotonicity property. From this viewpoint, the most important fact about $partial f$ is that it is monotone.
- The fact that this derivation is so short, essentially only two lines, helps to explain how the "firmly nonexpansive" property might be discovered in the first place. A mathematician playing around with these definitions might stumble across the inequality $(heartsuit)$ rather quickly. A mathematician would notice that inequality $(heartsuit)$ implies that $operatorname{prox}_f$ is nonexpansive (because $langle x_1 - x_2, u_1 - u_2 rangle leq |x_1 - x_2|_1 | u_1 - u_2 |_2$), but it may seem wasteful to replace the inequality $(heartsuit)$ with the inequality $|u_1 - u_2 |_2 leq |x_1 - x_2 |_2$, because this latter inequality is less tight.
$endgroup$
add a comment |
$begingroup$
Here's a short proof that proximal operators are firmly nonexpansive. Let $f:mathbb R^n to mathbb R cup {infty}$ be closed convex proper (CCP). Assume that
begin{equation}
u_1 = operatorname{prox}_f(x_1) quad text{and} quad u_2 = operatorname{prox}_f(x_2).
end{equation}
Equivalently,
begin{equation}
tag{$spadesuit$} x_1 - u_1 in partial f(u_1) quad text{and} quad x_2 - u_2 in partial f(u_2).
end{equation}
Now we use the (fundamentally important) fact that $partial f$ is a monotone mapping, which tells us that
begin{align}
&langle x_1 - u_1 - (x_2 - u_2), u_1 - u_2 rangle geq 0 \
tag{$heartsuit$}implies & langle x_1 - x_2, u_1 - u_2 rangle geq | u_1 - u_2 |_2^2.
end{align}
This shows that $operatorname{prox}_f$ is firmly nonexpansive.
Comments:
- When working with prox-operators, often the very first step is to express $u = operatorname{prox}_f(x)$ in the equivalent form $x - u in partial f(u)$. This is a statement you can work with, involving more primitive notions.
- The fact that $partial f$ is monotone is so fundamental, that it is extremely tempting to invoke monotonicity once we have written down equation $(spadesuit)$. In fact, much of the theory of prox-operators can be generalized to the setting of "monotone operators" which are set-valued mappings (such as $partial f$) which satisfy the monotonicity property. From this viewpoint, the most important fact about $partial f$ is that it is monotone.
- The fact that this derivation is so short, essentially only two lines, helps to explain how the "firmly nonexpansive" property might be discovered in the first place. A mathematician playing around with these definitions might stumble across the inequality $(heartsuit)$ rather quickly. A mathematician would notice that inequality $(heartsuit)$ implies that $operatorname{prox}_f$ is nonexpansive (because $langle x_1 - x_2, u_1 - u_2 rangle leq |x_1 - x_2|_1 | u_1 - u_2 |_2$), but it may seem wasteful to replace the inequality $(heartsuit)$ with the inequality $|u_1 - u_2 |_2 leq |x_1 - x_2 |_2$, because this latter inequality is less tight.
$endgroup$
Here's a short proof that proximal operators are firmly nonexpansive. Let $f:mathbb R^n to mathbb R cup {infty}$ be closed convex proper (CCP). Assume that
begin{equation}
u_1 = operatorname{prox}_f(x_1) quad text{and} quad u_2 = operatorname{prox}_f(x_2).
end{equation}
Equivalently,
begin{equation}
tag{$spadesuit$} x_1 - u_1 in partial f(u_1) quad text{and} quad x_2 - u_2 in partial f(u_2).
end{equation}
Now we use the (fundamentally important) fact that $partial f$ is a monotone mapping, which tells us that
begin{align}
&langle x_1 - u_1 - (x_2 - u_2), u_1 - u_2 rangle geq 0 \
tag{$heartsuit$}implies & langle x_1 - x_2, u_1 - u_2 rangle geq | u_1 - u_2 |_2^2.
end{align}
This shows that $operatorname{prox}_f$ is firmly nonexpansive.
Comments:
- When working with prox-operators, often the very first step is to express $u = operatorname{prox}_f(x)$ in the equivalent form $x - u in partial f(u)$. This is a statement you can work with, involving more primitive notions.
- The fact that $partial f$ is monotone is so fundamental, that it is extremely tempting to invoke monotonicity once we have written down equation $(spadesuit)$. In fact, much of the theory of prox-operators can be generalized to the setting of "monotone operators" which are set-valued mappings (such as $partial f$) which satisfy the monotonicity property. From this viewpoint, the most important fact about $partial f$ is that it is monotone.
- The fact that this derivation is so short, essentially only two lines, helps to explain how the "firmly nonexpansive" property might be discovered in the first place. A mathematician playing around with these definitions might stumble across the inequality $(heartsuit)$ rather quickly. A mathematician would notice that inequality $(heartsuit)$ implies that $operatorname{prox}_f$ is nonexpansive (because $langle x_1 - x_2, u_1 - u_2 rangle leq |x_1 - x_2|_1 | u_1 - u_2 |_2$), but it may seem wasteful to replace the inequality $(heartsuit)$ with the inequality $|u_1 - u_2 |_2 leq |x_1 - x_2 |_2$, because this latter inequality is less tight.
edited Jan 22 at 17:14
answered Aug 22 '16 at 15:23


littleOlittleO
29.9k647110
29.9k647110
add a comment |
add a comment |
$begingroup$
You can proof the non-expansiveness of the proximal operator as following:
$DeclareMathOperator{prox}{prox}$
Note that the proof is similar to the proof of the (firm) non-expansiveness of the projection operator [1].
You can imagine the the proximal operator as some kind of generalised projection operator.
Proof:
We pick two arbitrary points $z_1$, $z_2$.
begin{align}
(z_1 - prox_f(z_1))^top (x- prox_f(z_1)) &le 0 quad forall x in mathcal C \
end{align}
By definition of the proximal operator $prox_f(z_2)$ is in $mathcal C$.
The optimality condition for $prox_f(z_1)$ is:
begin{align}
(f_{prox_f(z_1)} + prox_f(z_1) - z_1)^top (prox_f (z_2) - prox_f (z_1)) &ge 0 label{eq:opt_cond_1}\
end{align}
and for $prox_f(z_2)$:
begin{align}
(f_{prox_f(z_2)} + prox_f(z_2) - z_2)^top (prox_f (z_1) - prox_f (z_2)) &ge 0 \
(z_2 - f_{prox_f(z_2)} - prox_f(z_2) - )^top (prox_f (z_2) - prox_f (z_1)) &ge 0 label{eq:opt_cond_2}
end{align}
The subgradient properties for $f_{prox_f(z_1)}$ and $f_{prox_f (z_2)}$ are:
begin{align}
f(prox_f(z_2)) - f(prox_f(z_1)) ge f_{prox_f(z_2)}^top (prox_f(z_2) - prox_f(z_1)) label{eq:subgrad_1}\
f(prox_f(z_1)) - f(prox_f(z_2)) ge f_{prox_f(z_1)}^top (prox_f(z_1) - prox_f(z_2)) label{eq:subgrad_2}\
end{align}
Adding the two subgradient properties results to:
begin{align}
0 ge (f_{prox_f(z_2)} - f_{prox_f(z_1)})^top (prox_f(z_2) - prox_f(z_1)) label{eq:subgrad_zero}
end{align}
Adding the two optimality conditions leads us to:
begin{align}
((z_2 - z_1) + (f_{prox_f(z_1)} - f_{prox_f(z_2)}) + (prox_f(z_1) - prox_f(z_2)))^top (prox_f(z_2) - prox_f(z_1)) &ge 0 \
(- (z_2 - z_1) + (f_{prox_f(z_2)} - f_{prox_f(z_1)}) + (prox_f(z_2) - prox_f(z_1)))^top (prox_f(z_2) - prox_f(z_1)) &le 0
end{align}
We use that the subgradient properties to bound the subgradients by zero:
begin{align}
(prox_f(z_2) - prox_f(z_1))^top (prox_f(z_2) - prox_f(z_1)) &le ((z_2 - z_1) - (f_{prox_f(z_2)} - f_{prox_f(z_1)}))^top (prox_f(z_2) - prox_f(z_1)) \
(prox_f(z_2) - prox_f(z_1))^toprox_f (prox_f(z_2) - prox_f(z_1)) &le (z_2 - z_1)^top (prox_f(z_2) - prox_f(z_1))
end{align}
And use Cauchy-Schwarz:
begin{align}
||prox_f(z_2) - prox_f(z_1)||^2 &le ||z_2 - z_1|| ||prox_f(z_2) - prox_f(z_1)|| \
||prox_f(z_2) - prox_f(z_1)|| &le ||z_2 - z_1|| \
&&square
end{align}
$endgroup$
1
$begingroup$
Please use 'operatorname{prox}'
$endgroup$
– Silvia Ghinassi
Aug 22 '16 at 14:59
$begingroup$
Okay, thanks for the remark!
$endgroup$
– martinzellner
Aug 24 '16 at 9:52
add a comment |
$begingroup$
You can proof the non-expansiveness of the proximal operator as following:
$DeclareMathOperator{prox}{prox}$
Note that the proof is similar to the proof of the (firm) non-expansiveness of the projection operator [1].
You can imagine the the proximal operator as some kind of generalised projection operator.
Proof:
We pick two arbitrary points $z_1$, $z_2$.
begin{align}
(z_1 - prox_f(z_1))^top (x- prox_f(z_1)) &le 0 quad forall x in mathcal C \
end{align}
By definition of the proximal operator $prox_f(z_2)$ is in $mathcal C$.
The optimality condition for $prox_f(z_1)$ is:
begin{align}
(f_{prox_f(z_1)} + prox_f(z_1) - z_1)^top (prox_f (z_2) - prox_f (z_1)) &ge 0 label{eq:opt_cond_1}\
end{align}
and for $prox_f(z_2)$:
begin{align}
(f_{prox_f(z_2)} + prox_f(z_2) - z_2)^top (prox_f (z_1) - prox_f (z_2)) &ge 0 \
(z_2 - f_{prox_f(z_2)} - prox_f(z_2) - )^top (prox_f (z_2) - prox_f (z_1)) &ge 0 label{eq:opt_cond_2}
end{align}
The subgradient properties for $f_{prox_f(z_1)}$ and $f_{prox_f (z_2)}$ are:
begin{align}
f(prox_f(z_2)) - f(prox_f(z_1)) ge f_{prox_f(z_2)}^top (prox_f(z_2) - prox_f(z_1)) label{eq:subgrad_1}\
f(prox_f(z_1)) - f(prox_f(z_2)) ge f_{prox_f(z_1)}^top (prox_f(z_1) - prox_f(z_2)) label{eq:subgrad_2}\
end{align}
Adding the two subgradient properties results to:
begin{align}
0 ge (f_{prox_f(z_2)} - f_{prox_f(z_1)})^top (prox_f(z_2) - prox_f(z_1)) label{eq:subgrad_zero}
end{align}
Adding the two optimality conditions leads us to:
begin{align}
((z_2 - z_1) + (f_{prox_f(z_1)} - f_{prox_f(z_2)}) + (prox_f(z_1) - prox_f(z_2)))^top (prox_f(z_2) - prox_f(z_1)) &ge 0 \
(- (z_2 - z_1) + (f_{prox_f(z_2)} - f_{prox_f(z_1)}) + (prox_f(z_2) - prox_f(z_1)))^top (prox_f(z_2) - prox_f(z_1)) &le 0
end{align}
We use that the subgradient properties to bound the subgradients by zero:
begin{align}
(prox_f(z_2) - prox_f(z_1))^top (prox_f(z_2) - prox_f(z_1)) &le ((z_2 - z_1) - (f_{prox_f(z_2)} - f_{prox_f(z_1)}))^top (prox_f(z_2) - prox_f(z_1)) \
(prox_f(z_2) - prox_f(z_1))^toprox_f (prox_f(z_2) - prox_f(z_1)) &le (z_2 - z_1)^top (prox_f(z_2) - prox_f(z_1))
end{align}
And use Cauchy-Schwarz:
begin{align}
||prox_f(z_2) - prox_f(z_1)||^2 &le ||z_2 - z_1|| ||prox_f(z_2) - prox_f(z_1)|| \
||prox_f(z_2) - prox_f(z_1)|| &le ||z_2 - z_1|| \
&&square
end{align}
$endgroup$
1
$begingroup$
Please use 'operatorname{prox}'
$endgroup$
– Silvia Ghinassi
Aug 22 '16 at 14:59
$begingroup$
Okay, thanks for the remark!
$endgroup$
– martinzellner
Aug 24 '16 at 9:52
add a comment |
$begingroup$
You can proof the non-expansiveness of the proximal operator as following:
$DeclareMathOperator{prox}{prox}$
Note that the proof is similar to the proof of the (firm) non-expansiveness of the projection operator [1].
You can imagine the the proximal operator as some kind of generalised projection operator.
Proof:
We pick two arbitrary points $z_1$, $z_2$.
begin{align}
(z_1 - prox_f(z_1))^top (x- prox_f(z_1)) &le 0 quad forall x in mathcal C \
end{align}
By definition of the proximal operator $prox_f(z_2)$ is in $mathcal C$.
The optimality condition for $prox_f(z_1)$ is:
begin{align}
(f_{prox_f(z_1)} + prox_f(z_1) - z_1)^top (prox_f (z_2) - prox_f (z_1)) &ge 0 label{eq:opt_cond_1}\
end{align}
and for $prox_f(z_2)$:
begin{align}
(f_{prox_f(z_2)} + prox_f(z_2) - z_2)^top (prox_f (z_1) - prox_f (z_2)) &ge 0 \
(z_2 - f_{prox_f(z_2)} - prox_f(z_2) - )^top (prox_f (z_2) - prox_f (z_1)) &ge 0 label{eq:opt_cond_2}
end{align}
The subgradient properties for $f_{prox_f(z_1)}$ and $f_{prox_f (z_2)}$ are:
begin{align}
f(prox_f(z_2)) - f(prox_f(z_1)) ge f_{prox_f(z_2)}^top (prox_f(z_2) - prox_f(z_1)) label{eq:subgrad_1}\
f(prox_f(z_1)) - f(prox_f(z_2)) ge f_{prox_f(z_1)}^top (prox_f(z_1) - prox_f(z_2)) label{eq:subgrad_2}\
end{align}
Adding the two subgradient properties results to:
begin{align}
0 ge (f_{prox_f(z_2)} - f_{prox_f(z_1)})^top (prox_f(z_2) - prox_f(z_1)) label{eq:subgrad_zero}
end{align}
Adding the two optimality conditions leads us to:
begin{align}
((z_2 - z_1) + (f_{prox_f(z_1)} - f_{prox_f(z_2)}) + (prox_f(z_1) - prox_f(z_2)))^top (prox_f(z_2) - prox_f(z_1)) &ge 0 \
(- (z_2 - z_1) + (f_{prox_f(z_2)} - f_{prox_f(z_1)}) + (prox_f(z_2) - prox_f(z_1)))^top (prox_f(z_2) - prox_f(z_1)) &le 0
end{align}
We use that the subgradient properties to bound the subgradients by zero:
begin{align}
(prox_f(z_2) - prox_f(z_1))^top (prox_f(z_2) - prox_f(z_1)) &le ((z_2 - z_1) - (f_{prox_f(z_2)} - f_{prox_f(z_1)}))^top (prox_f(z_2) - prox_f(z_1)) \
(prox_f(z_2) - prox_f(z_1))^toprox_f (prox_f(z_2) - prox_f(z_1)) &le (z_2 - z_1)^top (prox_f(z_2) - prox_f(z_1))
end{align}
And use Cauchy-Schwarz:
begin{align}
||prox_f(z_2) - prox_f(z_1)||^2 &le ||z_2 - z_1|| ||prox_f(z_2) - prox_f(z_1)|| \
||prox_f(z_2) - prox_f(z_1)|| &le ||z_2 - z_1|| \
&&square
end{align}
$endgroup$
You can proof the non-expansiveness of the proximal operator as following:
$DeclareMathOperator{prox}{prox}$
Note that the proof is similar to the proof of the (firm) non-expansiveness of the projection operator [1].
You can imagine the the proximal operator as some kind of generalised projection operator.
Proof:
We pick two arbitrary points $z_1$, $z_2$.
begin{align}
(z_1 - prox_f(z_1))^top (x- prox_f(z_1)) &le 0 quad forall x in mathcal C \
end{align}
By definition of the proximal operator $prox_f(z_2)$ is in $mathcal C$.
The optimality condition for $prox_f(z_1)$ is:
begin{align}
(f_{prox_f(z_1)} + prox_f(z_1) - z_1)^top (prox_f (z_2) - prox_f (z_1)) &ge 0 label{eq:opt_cond_1}\
end{align}
and for $prox_f(z_2)$:
begin{align}
(f_{prox_f(z_2)} + prox_f(z_2) - z_2)^top (prox_f (z_1) - prox_f (z_2)) &ge 0 \
(z_2 - f_{prox_f(z_2)} - prox_f(z_2) - )^top (prox_f (z_2) - prox_f (z_1)) &ge 0 label{eq:opt_cond_2}
end{align}
The subgradient properties for $f_{prox_f(z_1)}$ and $f_{prox_f (z_2)}$ are:
begin{align}
f(prox_f(z_2)) - f(prox_f(z_1)) ge f_{prox_f(z_2)}^top (prox_f(z_2) - prox_f(z_1)) label{eq:subgrad_1}\
f(prox_f(z_1)) - f(prox_f(z_2)) ge f_{prox_f(z_1)}^top (prox_f(z_1) - prox_f(z_2)) label{eq:subgrad_2}\
end{align}
Adding the two subgradient properties results to:
begin{align}
0 ge (f_{prox_f(z_2)} - f_{prox_f(z_1)})^top (prox_f(z_2) - prox_f(z_1)) label{eq:subgrad_zero}
end{align}
Adding the two optimality conditions leads us to:
begin{align}
((z_2 - z_1) + (f_{prox_f(z_1)} - f_{prox_f(z_2)}) + (prox_f(z_1) - prox_f(z_2)))^top (prox_f(z_2) - prox_f(z_1)) &ge 0 \
(- (z_2 - z_1) + (f_{prox_f(z_2)} - f_{prox_f(z_1)}) + (prox_f(z_2) - prox_f(z_1)))^top (prox_f(z_2) - prox_f(z_1)) &le 0
end{align}
We use that the subgradient properties to bound the subgradients by zero:
begin{align}
(prox_f(z_2) - prox_f(z_1))^top (prox_f(z_2) - prox_f(z_1)) &le ((z_2 - z_1) - (f_{prox_f(z_2)} - f_{prox_f(z_1)}))^top (prox_f(z_2) - prox_f(z_1)) \
(prox_f(z_2) - prox_f(z_1))^toprox_f (prox_f(z_2) - prox_f(z_1)) &le (z_2 - z_1)^top (prox_f(z_2) - prox_f(z_1))
end{align}
And use Cauchy-Schwarz:
begin{align}
||prox_f(z_2) - prox_f(z_1)||^2 &le ||z_2 - z_1|| ||prox_f(z_2) - prox_f(z_1)|| \
||prox_f(z_2) - prox_f(z_1)|| &le ||z_2 - z_1|| \
&&square
end{align}
edited Aug 24 '16 at 9:52
answered Aug 22 '16 at 14:35
martinzellnermartinzellner
112
112
1
$begingroup$
Please use 'operatorname{prox}'
$endgroup$
– Silvia Ghinassi
Aug 22 '16 at 14:59
$begingroup$
Okay, thanks for the remark!
$endgroup$
– martinzellner
Aug 24 '16 at 9:52
add a comment |
1
$begingroup$
Please use 'operatorname{prox}'
$endgroup$
– Silvia Ghinassi
Aug 22 '16 at 14:59
$begingroup$
Okay, thanks for the remark!
$endgroup$
– martinzellner
Aug 24 '16 at 9:52
1
1
$begingroup$
Please use 'operatorname{prox}'
$endgroup$
– Silvia Ghinassi
Aug 22 '16 at 14:59
$begingroup$
Please use 'operatorname{prox}'
$endgroup$
– Silvia Ghinassi
Aug 22 '16 at 14:59
$begingroup$
Okay, thanks for the remark!
$endgroup$
– martinzellner
Aug 24 '16 at 9:52
$begingroup$
Okay, thanks for the remark!
$endgroup$
– martinzellner
Aug 24 '16 at 9:52
add a comment |
$begingroup$
I can explain the first question, the author said "if $prox_f$ were a contraction", which is not guaranteed. Banach fixed-point theorem tells us repeatedly applying contraction would find a (here, unique) fixed point. So if $prox_f$ is a contraction, then by repeatedly applying proximal operator will result a fixed point.
$endgroup$
add a comment |
$begingroup$
I can explain the first question, the author said "if $prox_f$ were a contraction", which is not guaranteed. Banach fixed-point theorem tells us repeatedly applying contraction would find a (here, unique) fixed point. So if $prox_f$ is a contraction, then by repeatedly applying proximal operator will result a fixed point.
$endgroup$
add a comment |
$begingroup$
I can explain the first question, the author said "if $prox_f$ were a contraction", which is not guaranteed. Banach fixed-point theorem tells us repeatedly applying contraction would find a (here, unique) fixed point. So if $prox_f$ is a contraction, then by repeatedly applying proximal operator will result a fixed point.
$endgroup$
I can explain the first question, the author said "if $prox_f$ were a contraction", which is not guaranteed. Banach fixed-point theorem tells us repeatedly applying contraction would find a (here, unique) fixed point. So if $prox_f$ is a contraction, then by repeatedly applying proximal operator will result a fixed point.
answered Jul 26 '15 at 8:44
chuanchuan
11
11
add a comment |
add a comment |
Thanks for contributing an answer to Mathematics Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f910803%2ffirm-non-expansiveness-in-the-context-of-proximal-operators%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown