Firm non-expansiveness in the context of proximal operators












6












$begingroup$


$newcommand{prox}{operatorname{prox}}$
Probably the most remarkable property of the proximal operator is the fixed point property:



The point $x^*$ minimizes $f$ if and only if $x^* = prox_f(x^*) $



So, indeed, $f$ can be minimized by find a fixed point of its proximal operator.
http://web.stanford.edu/~boyd/papers/pdf/prox_algs.pdf



Question 1. In the paper given above, author is saying:



If $prox_f$ were a contraction, i.e., Lipschitz continuous with
constant less than $1$, repeatedly applying $prox_f$ would find a (here,
unique) fixed point



Why the bound on the first-order derivative guarantees finding a fixed point by repeatedly applying proximal operator?



Question 2. Let me quote a paragraph from the same paper:



It turns out that while $prox_f$ need not be a contraction (unless $f$ is strongly convex), it does have a different property, firm nonexpansiveness, sufficient for fixed point iteration:



$|prox_f(x) - prox_f(y)|^2_2 le (x-y)^T (prox_f(x) - prox_f(y))$



$forall x,y in mathbb{R}^n$



Firmly nonexpansive operators are special cases of nonexpansive operators (those that are Lipschitz continuous with constant 1). Iteration of a general nonexpansive operator need not converge to a fixed point: consider operators like $-I$ or rotations. However, it tunrs out that if $N$ is nonexpansive, then the operator $T = (1-alpha)I + alpha N$, where $alpha in (0,1)$ has the same fixed points as $N$ and simple iteration of $T$ will converge to a fixed point of $T$ (and thus of $N$), i.e. the sequence:



$x^{k+1} := (1-alpha)x^k +alpha N(x^k)$



will converge to a fixed point of $N$. Put differently, damped iteration of
a nonexpansive operator will converge to one of its fixed points.



Operators in the form $(1-alpha)I + alpha N$, where $N$ is non-expansive and $alpha in (0,1)$, are called $alpha$-averaged operators.



Firmly nonexpansive operators are averaged: indeed, they are precisely the (1/2)-averaged
operators.



Why "unless $f$ is strongly convex"?



What is the intuition behind the given expression for firm nonexpansiveness?



How can you show that firm nonexpansive operators are $alpha$-averaged with $alpha = frac{1}{2}$?



Is anyone aware of the proof of why proximal map is firm nonexpansive?










share|cite|improve this question









$endgroup$

















    6












    $begingroup$


    $newcommand{prox}{operatorname{prox}}$
    Probably the most remarkable property of the proximal operator is the fixed point property:



    The point $x^*$ minimizes $f$ if and only if $x^* = prox_f(x^*) $



    So, indeed, $f$ can be minimized by find a fixed point of its proximal operator.
    http://web.stanford.edu/~boyd/papers/pdf/prox_algs.pdf



    Question 1. In the paper given above, author is saying:



    If $prox_f$ were a contraction, i.e., Lipschitz continuous with
    constant less than $1$, repeatedly applying $prox_f$ would find a (here,
    unique) fixed point



    Why the bound on the first-order derivative guarantees finding a fixed point by repeatedly applying proximal operator?



    Question 2. Let me quote a paragraph from the same paper:



    It turns out that while $prox_f$ need not be a contraction (unless $f$ is strongly convex), it does have a different property, firm nonexpansiveness, sufficient for fixed point iteration:



    $|prox_f(x) - prox_f(y)|^2_2 le (x-y)^T (prox_f(x) - prox_f(y))$



    $forall x,y in mathbb{R}^n$



    Firmly nonexpansive operators are special cases of nonexpansive operators (those that are Lipschitz continuous with constant 1). Iteration of a general nonexpansive operator need not converge to a fixed point: consider operators like $-I$ or rotations. However, it tunrs out that if $N$ is nonexpansive, then the operator $T = (1-alpha)I + alpha N$, where $alpha in (0,1)$ has the same fixed points as $N$ and simple iteration of $T$ will converge to a fixed point of $T$ (and thus of $N$), i.e. the sequence:



    $x^{k+1} := (1-alpha)x^k +alpha N(x^k)$



    will converge to a fixed point of $N$. Put differently, damped iteration of
    a nonexpansive operator will converge to one of its fixed points.



    Operators in the form $(1-alpha)I + alpha N$, where $N$ is non-expansive and $alpha in (0,1)$, are called $alpha$-averaged operators.



    Firmly nonexpansive operators are averaged: indeed, they are precisely the (1/2)-averaged
    operators.



    Why "unless $f$ is strongly convex"?



    What is the intuition behind the given expression for firm nonexpansiveness?



    How can you show that firm nonexpansive operators are $alpha$-averaged with $alpha = frac{1}{2}$?



    Is anyone aware of the proof of why proximal map is firm nonexpansive?










    share|cite|improve this question









    $endgroup$















      6












      6








      6


      5



      $begingroup$


      $newcommand{prox}{operatorname{prox}}$
      Probably the most remarkable property of the proximal operator is the fixed point property:



      The point $x^*$ minimizes $f$ if and only if $x^* = prox_f(x^*) $



      So, indeed, $f$ can be minimized by find a fixed point of its proximal operator.
      http://web.stanford.edu/~boyd/papers/pdf/prox_algs.pdf



      Question 1. In the paper given above, author is saying:



      If $prox_f$ were a contraction, i.e., Lipschitz continuous with
      constant less than $1$, repeatedly applying $prox_f$ would find a (here,
      unique) fixed point



      Why the bound on the first-order derivative guarantees finding a fixed point by repeatedly applying proximal operator?



      Question 2. Let me quote a paragraph from the same paper:



      It turns out that while $prox_f$ need not be a contraction (unless $f$ is strongly convex), it does have a different property, firm nonexpansiveness, sufficient for fixed point iteration:



      $|prox_f(x) - prox_f(y)|^2_2 le (x-y)^T (prox_f(x) - prox_f(y))$



      $forall x,y in mathbb{R}^n$



      Firmly nonexpansive operators are special cases of nonexpansive operators (those that are Lipschitz continuous with constant 1). Iteration of a general nonexpansive operator need not converge to a fixed point: consider operators like $-I$ or rotations. However, it tunrs out that if $N$ is nonexpansive, then the operator $T = (1-alpha)I + alpha N$, where $alpha in (0,1)$ has the same fixed points as $N$ and simple iteration of $T$ will converge to a fixed point of $T$ (and thus of $N$), i.e. the sequence:



      $x^{k+1} := (1-alpha)x^k +alpha N(x^k)$



      will converge to a fixed point of $N$. Put differently, damped iteration of
      a nonexpansive operator will converge to one of its fixed points.



      Operators in the form $(1-alpha)I + alpha N$, where $N$ is non-expansive and $alpha in (0,1)$, are called $alpha$-averaged operators.



      Firmly nonexpansive operators are averaged: indeed, they are precisely the (1/2)-averaged
      operators.



      Why "unless $f$ is strongly convex"?



      What is the intuition behind the given expression for firm nonexpansiveness?



      How can you show that firm nonexpansive operators are $alpha$-averaged with $alpha = frac{1}{2}$?



      Is anyone aware of the proof of why proximal map is firm nonexpansive?










      share|cite|improve this question









      $endgroup$




      $newcommand{prox}{operatorname{prox}}$
      Probably the most remarkable property of the proximal operator is the fixed point property:



      The point $x^*$ minimizes $f$ if and only if $x^* = prox_f(x^*) $



      So, indeed, $f$ can be minimized by find a fixed point of its proximal operator.
      http://web.stanford.edu/~boyd/papers/pdf/prox_algs.pdf



      Question 1. In the paper given above, author is saying:



      If $prox_f$ were a contraction, i.e., Lipschitz continuous with
      constant less than $1$, repeatedly applying $prox_f$ would find a (here,
      unique) fixed point



      Why the bound on the first-order derivative guarantees finding a fixed point by repeatedly applying proximal operator?



      Question 2. Let me quote a paragraph from the same paper:



      It turns out that while $prox_f$ need not be a contraction (unless $f$ is strongly convex), it does have a different property, firm nonexpansiveness, sufficient for fixed point iteration:



      $|prox_f(x) - prox_f(y)|^2_2 le (x-y)^T (prox_f(x) - prox_f(y))$



      $forall x,y in mathbb{R}^n$



      Firmly nonexpansive operators are special cases of nonexpansive operators (those that are Lipschitz continuous with constant 1). Iteration of a general nonexpansive operator need not converge to a fixed point: consider operators like $-I$ or rotations. However, it tunrs out that if $N$ is nonexpansive, then the operator $T = (1-alpha)I + alpha N$, where $alpha in (0,1)$ has the same fixed points as $N$ and simple iteration of $T$ will converge to a fixed point of $T$ (and thus of $N$), i.e. the sequence:



      $x^{k+1} := (1-alpha)x^k +alpha N(x^k)$



      will converge to a fixed point of $N$. Put differently, damped iteration of
      a nonexpansive operator will converge to one of its fixed points.



      Operators in the form $(1-alpha)I + alpha N$, where $N$ is non-expansive and $alpha in (0,1)$, are called $alpha$-averaged operators.



      Firmly nonexpansive operators are averaged: indeed, they are precisely the (1/2)-averaged
      operators.



      Why "unless $f$ is strongly convex"?



      What is the intuition behind the given expression for firm nonexpansiveness?



      How can you show that firm nonexpansive operators are $alpha$-averaged with $alpha = frac{1}{2}$?



      Is anyone aware of the proof of why proximal map is firm nonexpansive?







      multivariable-calculus operator-theory convex-analysis convex-optimization fixed-point-theorems






      share|cite|improve this question













      share|cite|improve this question











      share|cite|improve this question




      share|cite|improve this question










      asked Aug 27 '14 at 11:23









      trembiktrembik

      4441518




      4441518






















          3 Answers
          3






          active

          oldest

          votes


















          5












          $begingroup$

          Here's a short proof that proximal operators are firmly nonexpansive. Let $f:mathbb R^n to mathbb R cup {infty}$ be closed convex proper (CCP). Assume that
          begin{equation}
          u_1 = operatorname{prox}_f(x_1) quad text{and} quad u_2 = operatorname{prox}_f(x_2).
          end{equation}

          Equivalently,
          begin{equation}
          tag{$spadesuit$} x_1 - u_1 in partial f(u_1) quad text{and} quad x_2 - u_2 in partial f(u_2).
          end{equation}

          Now we use the (fundamentally important) fact that $partial f$ is a monotone mapping, which tells us that
          begin{align}
          &langle x_1 - u_1 - (x_2 - u_2), u_1 - u_2 rangle geq 0 \
          tag{$heartsuit$}implies & langle x_1 - x_2, u_1 - u_2 rangle geq | u_1 - u_2 |_2^2.
          end{align}

          This shows that $operatorname{prox}_f$ is firmly nonexpansive.



          Comments:




          • When working with prox-operators, often the very first step is to express $u = operatorname{prox}_f(x)$ in the equivalent form $x - u in partial f(u)$. This is a statement you can work with, involving more primitive notions.

          • The fact that $partial f$ is monotone is so fundamental, that it is extremely tempting to invoke monotonicity once we have written down equation $(spadesuit)$. In fact, much of the theory of prox-operators can be generalized to the setting of "monotone operators" which are set-valued mappings (such as $partial f$) which satisfy the monotonicity property. From this viewpoint, the most important fact about $partial f$ is that it is monotone.

          • The fact that this derivation is so short, essentially only two lines, helps to explain how the "firmly nonexpansive" property might be discovered in the first place. A mathematician playing around with these definitions might stumble across the inequality $(heartsuit)$ rather quickly. A mathematician would notice that inequality $(heartsuit)$ implies that $operatorname{prox}_f$ is nonexpansive (because $langle x_1 - x_2, u_1 - u_2 rangle leq |x_1 - x_2|_1 | u_1 - u_2 |_2$), but it may seem wasteful to replace the inequality $(heartsuit)$ with the inequality $|u_1 - u_2 |_2 leq |x_1 - x_2 |_2$, because this latter inequality is less tight.






          share|cite|improve this answer











          $endgroup$





















            1












            $begingroup$

            You can proof the non-expansiveness of the proximal operator as following:
            $DeclareMathOperator{prox}{prox}$



            Note that the proof is similar to the proof of the (firm) non-expansiveness of the projection operator [1].



            You can imagine the the proximal operator as some kind of generalised projection operator.



            Proof:
            We pick two arbitrary points $z_1$, $z_2$.



            begin{align}
            (z_1 - prox_f(z_1))^top (x- prox_f(z_1)) &le 0 quad forall x in mathcal C \
            end{align}



            By definition of the proximal operator $prox_f(z_2)$ is in $mathcal C$.



            The optimality condition for $prox_f(z_1)$ is:
            begin{align}
            (f_{prox_f(z_1)} + prox_f(z_1) - z_1)^top (prox_f (z_2) - prox_f (z_1)) &ge 0 label{eq:opt_cond_1}\
            end{align}



            and for $prox_f(z_2)$:
            begin{align}
            (f_{prox_f(z_2)} + prox_f(z_2) - z_2)^top (prox_f (z_1) - prox_f (z_2)) &ge 0 \
            (z_2 - f_{prox_f(z_2)} - prox_f(z_2) - )^top (prox_f (z_2) - prox_f (z_1)) &ge 0 label{eq:opt_cond_2}
            end{align}



            The subgradient properties for $f_{prox_f(z_1)}$ and $f_{prox_f (z_2)}$ are:



            begin{align}
            f(prox_f(z_2)) - f(prox_f(z_1)) ge f_{prox_f(z_2)}^top (prox_f(z_2) - prox_f(z_1)) label{eq:subgrad_1}\
            f(prox_f(z_1)) - f(prox_f(z_2)) ge f_{prox_f(z_1)}^top (prox_f(z_1) - prox_f(z_2)) label{eq:subgrad_2}\
            end{align}
            Adding the two subgradient properties results to:
            begin{align}
            0 ge (f_{prox_f(z_2)} - f_{prox_f(z_1)})^top (prox_f(z_2) - prox_f(z_1)) label{eq:subgrad_zero}
            end{align}



            Adding the two optimality conditions leads us to:
            begin{align}
            ((z_2 - z_1) + (f_{prox_f(z_1)} - f_{prox_f(z_2)}) + (prox_f(z_1) - prox_f(z_2)))^top (prox_f(z_2) - prox_f(z_1)) &ge 0 \
            (- (z_2 - z_1) + (f_{prox_f(z_2)} - f_{prox_f(z_1)}) + (prox_f(z_2) - prox_f(z_1)))^top (prox_f(z_2) - prox_f(z_1)) &le 0
            end{align}



            We use that the subgradient properties to bound the subgradients by zero:
            begin{align}
            (prox_f(z_2) - prox_f(z_1))^top (prox_f(z_2) - prox_f(z_1)) &le ((z_2 - z_1) - (f_{prox_f(z_2)} - f_{prox_f(z_1)}))^top (prox_f(z_2) - prox_f(z_1)) \
            (prox_f(z_2) - prox_f(z_1))^toprox_f (prox_f(z_2) - prox_f(z_1)) &le (z_2 - z_1)^top (prox_f(z_2) - prox_f(z_1))
            end{align}
            And use Cauchy-Schwarz:
            begin{align}
            ||prox_f(z_2) - prox_f(z_1)||^2 &le ||z_2 - z_1|| ||prox_f(z_2) - prox_f(z_1)|| \
            ||prox_f(z_2) - prox_f(z_1)|| &le ||z_2 - z_1|| \
            &&square
            end{align}






            share|cite|improve this answer











            $endgroup$









            • 1




              $begingroup$
              Please use 'operatorname{prox}'
              $endgroup$
              – Silvia Ghinassi
              Aug 22 '16 at 14:59










            • $begingroup$
              Okay, thanks for the remark!
              $endgroup$
              – martinzellner
              Aug 24 '16 at 9:52



















            0












            $begingroup$

            I can explain the first question, the author said "if $prox_f$ were a contraction", which is not guaranteed. Banach fixed-point theorem tells us repeatedly applying contraction would find a (here, unique) fixed point. So if $prox_f$ is a contraction, then by repeatedly applying proximal operator will result a fixed point.






            share|cite|improve this answer









            $endgroup$













              Your Answer





              StackExchange.ifUsing("editor", function () {
              return StackExchange.using("mathjaxEditing", function () {
              StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
              StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
              });
              });
              }, "mathjax-editing");

              StackExchange.ready(function() {
              var channelOptions = {
              tags: "".split(" "),
              id: "69"
              };
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function() {
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled) {
              StackExchange.using("snippets", function() {
              createEditor();
              });
              }
              else {
              createEditor();
              }
              });

              function createEditor() {
              StackExchange.prepareEditor({
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: true,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: 10,
              bindNavPrevention: true,
              postfix: "",
              imageUploader: {
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              },
              noCode: true, onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              });


              }
              });














              draft saved

              draft discarded


















              StackExchange.ready(
              function () {
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f910803%2ffirm-non-expansiveness-in-the-context-of-proximal-operators%23new-answer', 'question_page');
              }
              );

              Post as a guest















              Required, but never shown

























              3 Answers
              3






              active

              oldest

              votes








              3 Answers
              3






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              5












              $begingroup$

              Here's a short proof that proximal operators are firmly nonexpansive. Let $f:mathbb R^n to mathbb R cup {infty}$ be closed convex proper (CCP). Assume that
              begin{equation}
              u_1 = operatorname{prox}_f(x_1) quad text{and} quad u_2 = operatorname{prox}_f(x_2).
              end{equation}

              Equivalently,
              begin{equation}
              tag{$spadesuit$} x_1 - u_1 in partial f(u_1) quad text{and} quad x_2 - u_2 in partial f(u_2).
              end{equation}

              Now we use the (fundamentally important) fact that $partial f$ is a monotone mapping, which tells us that
              begin{align}
              &langle x_1 - u_1 - (x_2 - u_2), u_1 - u_2 rangle geq 0 \
              tag{$heartsuit$}implies & langle x_1 - x_2, u_1 - u_2 rangle geq | u_1 - u_2 |_2^2.
              end{align}

              This shows that $operatorname{prox}_f$ is firmly nonexpansive.



              Comments:




              • When working with prox-operators, often the very first step is to express $u = operatorname{prox}_f(x)$ in the equivalent form $x - u in partial f(u)$. This is a statement you can work with, involving more primitive notions.

              • The fact that $partial f$ is monotone is so fundamental, that it is extremely tempting to invoke monotonicity once we have written down equation $(spadesuit)$. In fact, much of the theory of prox-operators can be generalized to the setting of "monotone operators" which are set-valued mappings (such as $partial f$) which satisfy the monotonicity property. From this viewpoint, the most important fact about $partial f$ is that it is monotone.

              • The fact that this derivation is so short, essentially only two lines, helps to explain how the "firmly nonexpansive" property might be discovered in the first place. A mathematician playing around with these definitions might stumble across the inequality $(heartsuit)$ rather quickly. A mathematician would notice that inequality $(heartsuit)$ implies that $operatorname{prox}_f$ is nonexpansive (because $langle x_1 - x_2, u_1 - u_2 rangle leq |x_1 - x_2|_1 | u_1 - u_2 |_2$), but it may seem wasteful to replace the inequality $(heartsuit)$ with the inequality $|u_1 - u_2 |_2 leq |x_1 - x_2 |_2$, because this latter inequality is less tight.






              share|cite|improve this answer











              $endgroup$


















                5












                $begingroup$

                Here's a short proof that proximal operators are firmly nonexpansive. Let $f:mathbb R^n to mathbb R cup {infty}$ be closed convex proper (CCP). Assume that
                begin{equation}
                u_1 = operatorname{prox}_f(x_1) quad text{and} quad u_2 = operatorname{prox}_f(x_2).
                end{equation}

                Equivalently,
                begin{equation}
                tag{$spadesuit$} x_1 - u_1 in partial f(u_1) quad text{and} quad x_2 - u_2 in partial f(u_2).
                end{equation}

                Now we use the (fundamentally important) fact that $partial f$ is a monotone mapping, which tells us that
                begin{align}
                &langle x_1 - u_1 - (x_2 - u_2), u_1 - u_2 rangle geq 0 \
                tag{$heartsuit$}implies & langle x_1 - x_2, u_1 - u_2 rangle geq | u_1 - u_2 |_2^2.
                end{align}

                This shows that $operatorname{prox}_f$ is firmly nonexpansive.



                Comments:




                • When working with prox-operators, often the very first step is to express $u = operatorname{prox}_f(x)$ in the equivalent form $x - u in partial f(u)$. This is a statement you can work with, involving more primitive notions.

                • The fact that $partial f$ is monotone is so fundamental, that it is extremely tempting to invoke monotonicity once we have written down equation $(spadesuit)$. In fact, much of the theory of prox-operators can be generalized to the setting of "monotone operators" which are set-valued mappings (such as $partial f$) which satisfy the monotonicity property. From this viewpoint, the most important fact about $partial f$ is that it is monotone.

                • The fact that this derivation is so short, essentially only two lines, helps to explain how the "firmly nonexpansive" property might be discovered in the first place. A mathematician playing around with these definitions might stumble across the inequality $(heartsuit)$ rather quickly. A mathematician would notice that inequality $(heartsuit)$ implies that $operatorname{prox}_f$ is nonexpansive (because $langle x_1 - x_2, u_1 - u_2 rangle leq |x_1 - x_2|_1 | u_1 - u_2 |_2$), but it may seem wasteful to replace the inequality $(heartsuit)$ with the inequality $|u_1 - u_2 |_2 leq |x_1 - x_2 |_2$, because this latter inequality is less tight.






                share|cite|improve this answer











                $endgroup$
















                  5












                  5








                  5





                  $begingroup$

                  Here's a short proof that proximal operators are firmly nonexpansive. Let $f:mathbb R^n to mathbb R cup {infty}$ be closed convex proper (CCP). Assume that
                  begin{equation}
                  u_1 = operatorname{prox}_f(x_1) quad text{and} quad u_2 = operatorname{prox}_f(x_2).
                  end{equation}

                  Equivalently,
                  begin{equation}
                  tag{$spadesuit$} x_1 - u_1 in partial f(u_1) quad text{and} quad x_2 - u_2 in partial f(u_2).
                  end{equation}

                  Now we use the (fundamentally important) fact that $partial f$ is a monotone mapping, which tells us that
                  begin{align}
                  &langle x_1 - u_1 - (x_2 - u_2), u_1 - u_2 rangle geq 0 \
                  tag{$heartsuit$}implies & langle x_1 - x_2, u_1 - u_2 rangle geq | u_1 - u_2 |_2^2.
                  end{align}

                  This shows that $operatorname{prox}_f$ is firmly nonexpansive.



                  Comments:




                  • When working with prox-operators, often the very first step is to express $u = operatorname{prox}_f(x)$ in the equivalent form $x - u in partial f(u)$. This is a statement you can work with, involving more primitive notions.

                  • The fact that $partial f$ is monotone is so fundamental, that it is extremely tempting to invoke monotonicity once we have written down equation $(spadesuit)$. In fact, much of the theory of prox-operators can be generalized to the setting of "monotone operators" which are set-valued mappings (such as $partial f$) which satisfy the monotonicity property. From this viewpoint, the most important fact about $partial f$ is that it is monotone.

                  • The fact that this derivation is so short, essentially only two lines, helps to explain how the "firmly nonexpansive" property might be discovered in the first place. A mathematician playing around with these definitions might stumble across the inequality $(heartsuit)$ rather quickly. A mathematician would notice that inequality $(heartsuit)$ implies that $operatorname{prox}_f$ is nonexpansive (because $langle x_1 - x_2, u_1 - u_2 rangle leq |x_1 - x_2|_1 | u_1 - u_2 |_2$), but it may seem wasteful to replace the inequality $(heartsuit)$ with the inequality $|u_1 - u_2 |_2 leq |x_1 - x_2 |_2$, because this latter inequality is less tight.






                  share|cite|improve this answer











                  $endgroup$



                  Here's a short proof that proximal operators are firmly nonexpansive. Let $f:mathbb R^n to mathbb R cup {infty}$ be closed convex proper (CCP). Assume that
                  begin{equation}
                  u_1 = operatorname{prox}_f(x_1) quad text{and} quad u_2 = operatorname{prox}_f(x_2).
                  end{equation}

                  Equivalently,
                  begin{equation}
                  tag{$spadesuit$} x_1 - u_1 in partial f(u_1) quad text{and} quad x_2 - u_2 in partial f(u_2).
                  end{equation}

                  Now we use the (fundamentally important) fact that $partial f$ is a monotone mapping, which tells us that
                  begin{align}
                  &langle x_1 - u_1 - (x_2 - u_2), u_1 - u_2 rangle geq 0 \
                  tag{$heartsuit$}implies & langle x_1 - x_2, u_1 - u_2 rangle geq | u_1 - u_2 |_2^2.
                  end{align}

                  This shows that $operatorname{prox}_f$ is firmly nonexpansive.



                  Comments:




                  • When working with prox-operators, often the very first step is to express $u = operatorname{prox}_f(x)$ in the equivalent form $x - u in partial f(u)$. This is a statement you can work with, involving more primitive notions.

                  • The fact that $partial f$ is monotone is so fundamental, that it is extremely tempting to invoke monotonicity once we have written down equation $(spadesuit)$. In fact, much of the theory of prox-operators can be generalized to the setting of "monotone operators" which are set-valued mappings (such as $partial f$) which satisfy the monotonicity property. From this viewpoint, the most important fact about $partial f$ is that it is monotone.

                  • The fact that this derivation is so short, essentially only two lines, helps to explain how the "firmly nonexpansive" property might be discovered in the first place. A mathematician playing around with these definitions might stumble across the inequality $(heartsuit)$ rather quickly. A mathematician would notice that inequality $(heartsuit)$ implies that $operatorname{prox}_f$ is nonexpansive (because $langle x_1 - x_2, u_1 - u_2 rangle leq |x_1 - x_2|_1 | u_1 - u_2 |_2$), but it may seem wasteful to replace the inequality $(heartsuit)$ with the inequality $|u_1 - u_2 |_2 leq |x_1 - x_2 |_2$, because this latter inequality is less tight.







                  share|cite|improve this answer














                  share|cite|improve this answer



                  share|cite|improve this answer








                  edited Jan 22 at 17:14

























                  answered Aug 22 '16 at 15:23









                  littleOlittleO

                  29.9k647110




                  29.9k647110























                      1












                      $begingroup$

                      You can proof the non-expansiveness of the proximal operator as following:
                      $DeclareMathOperator{prox}{prox}$



                      Note that the proof is similar to the proof of the (firm) non-expansiveness of the projection operator [1].



                      You can imagine the the proximal operator as some kind of generalised projection operator.



                      Proof:
                      We pick two arbitrary points $z_1$, $z_2$.



                      begin{align}
                      (z_1 - prox_f(z_1))^top (x- prox_f(z_1)) &le 0 quad forall x in mathcal C \
                      end{align}



                      By definition of the proximal operator $prox_f(z_2)$ is in $mathcal C$.



                      The optimality condition for $prox_f(z_1)$ is:
                      begin{align}
                      (f_{prox_f(z_1)} + prox_f(z_1) - z_1)^top (prox_f (z_2) - prox_f (z_1)) &ge 0 label{eq:opt_cond_1}\
                      end{align}



                      and for $prox_f(z_2)$:
                      begin{align}
                      (f_{prox_f(z_2)} + prox_f(z_2) - z_2)^top (prox_f (z_1) - prox_f (z_2)) &ge 0 \
                      (z_2 - f_{prox_f(z_2)} - prox_f(z_2) - )^top (prox_f (z_2) - prox_f (z_1)) &ge 0 label{eq:opt_cond_2}
                      end{align}



                      The subgradient properties for $f_{prox_f(z_1)}$ and $f_{prox_f (z_2)}$ are:



                      begin{align}
                      f(prox_f(z_2)) - f(prox_f(z_1)) ge f_{prox_f(z_2)}^top (prox_f(z_2) - prox_f(z_1)) label{eq:subgrad_1}\
                      f(prox_f(z_1)) - f(prox_f(z_2)) ge f_{prox_f(z_1)}^top (prox_f(z_1) - prox_f(z_2)) label{eq:subgrad_2}\
                      end{align}
                      Adding the two subgradient properties results to:
                      begin{align}
                      0 ge (f_{prox_f(z_2)} - f_{prox_f(z_1)})^top (prox_f(z_2) - prox_f(z_1)) label{eq:subgrad_zero}
                      end{align}



                      Adding the two optimality conditions leads us to:
                      begin{align}
                      ((z_2 - z_1) + (f_{prox_f(z_1)} - f_{prox_f(z_2)}) + (prox_f(z_1) - prox_f(z_2)))^top (prox_f(z_2) - prox_f(z_1)) &ge 0 \
                      (- (z_2 - z_1) + (f_{prox_f(z_2)} - f_{prox_f(z_1)}) + (prox_f(z_2) - prox_f(z_1)))^top (prox_f(z_2) - prox_f(z_1)) &le 0
                      end{align}



                      We use that the subgradient properties to bound the subgradients by zero:
                      begin{align}
                      (prox_f(z_2) - prox_f(z_1))^top (prox_f(z_2) - prox_f(z_1)) &le ((z_2 - z_1) - (f_{prox_f(z_2)} - f_{prox_f(z_1)}))^top (prox_f(z_2) - prox_f(z_1)) \
                      (prox_f(z_2) - prox_f(z_1))^toprox_f (prox_f(z_2) - prox_f(z_1)) &le (z_2 - z_1)^top (prox_f(z_2) - prox_f(z_1))
                      end{align}
                      And use Cauchy-Schwarz:
                      begin{align}
                      ||prox_f(z_2) - prox_f(z_1)||^2 &le ||z_2 - z_1|| ||prox_f(z_2) - prox_f(z_1)|| \
                      ||prox_f(z_2) - prox_f(z_1)|| &le ||z_2 - z_1|| \
                      &&square
                      end{align}






                      share|cite|improve this answer











                      $endgroup$









                      • 1




                        $begingroup$
                        Please use 'operatorname{prox}'
                        $endgroup$
                        – Silvia Ghinassi
                        Aug 22 '16 at 14:59










                      • $begingroup$
                        Okay, thanks for the remark!
                        $endgroup$
                        – martinzellner
                        Aug 24 '16 at 9:52
















                      1












                      $begingroup$

                      You can proof the non-expansiveness of the proximal operator as following:
                      $DeclareMathOperator{prox}{prox}$



                      Note that the proof is similar to the proof of the (firm) non-expansiveness of the projection operator [1].



                      You can imagine the the proximal operator as some kind of generalised projection operator.



                      Proof:
                      We pick two arbitrary points $z_1$, $z_2$.



                      begin{align}
                      (z_1 - prox_f(z_1))^top (x- prox_f(z_1)) &le 0 quad forall x in mathcal C \
                      end{align}



                      By definition of the proximal operator $prox_f(z_2)$ is in $mathcal C$.



                      The optimality condition for $prox_f(z_1)$ is:
                      begin{align}
                      (f_{prox_f(z_1)} + prox_f(z_1) - z_1)^top (prox_f (z_2) - prox_f (z_1)) &ge 0 label{eq:opt_cond_1}\
                      end{align}



                      and for $prox_f(z_2)$:
                      begin{align}
                      (f_{prox_f(z_2)} + prox_f(z_2) - z_2)^top (prox_f (z_1) - prox_f (z_2)) &ge 0 \
                      (z_2 - f_{prox_f(z_2)} - prox_f(z_2) - )^top (prox_f (z_2) - prox_f (z_1)) &ge 0 label{eq:opt_cond_2}
                      end{align}



                      The subgradient properties for $f_{prox_f(z_1)}$ and $f_{prox_f (z_2)}$ are:



                      begin{align}
                      f(prox_f(z_2)) - f(prox_f(z_1)) ge f_{prox_f(z_2)}^top (prox_f(z_2) - prox_f(z_1)) label{eq:subgrad_1}\
                      f(prox_f(z_1)) - f(prox_f(z_2)) ge f_{prox_f(z_1)}^top (prox_f(z_1) - prox_f(z_2)) label{eq:subgrad_2}\
                      end{align}
                      Adding the two subgradient properties results to:
                      begin{align}
                      0 ge (f_{prox_f(z_2)} - f_{prox_f(z_1)})^top (prox_f(z_2) - prox_f(z_1)) label{eq:subgrad_zero}
                      end{align}



                      Adding the two optimality conditions leads us to:
                      begin{align}
                      ((z_2 - z_1) + (f_{prox_f(z_1)} - f_{prox_f(z_2)}) + (prox_f(z_1) - prox_f(z_2)))^top (prox_f(z_2) - prox_f(z_1)) &ge 0 \
                      (- (z_2 - z_1) + (f_{prox_f(z_2)} - f_{prox_f(z_1)}) + (prox_f(z_2) - prox_f(z_1)))^top (prox_f(z_2) - prox_f(z_1)) &le 0
                      end{align}



                      We use that the subgradient properties to bound the subgradients by zero:
                      begin{align}
                      (prox_f(z_2) - prox_f(z_1))^top (prox_f(z_2) - prox_f(z_1)) &le ((z_2 - z_1) - (f_{prox_f(z_2)} - f_{prox_f(z_1)}))^top (prox_f(z_2) - prox_f(z_1)) \
                      (prox_f(z_2) - prox_f(z_1))^toprox_f (prox_f(z_2) - prox_f(z_1)) &le (z_2 - z_1)^top (prox_f(z_2) - prox_f(z_1))
                      end{align}
                      And use Cauchy-Schwarz:
                      begin{align}
                      ||prox_f(z_2) - prox_f(z_1)||^2 &le ||z_2 - z_1|| ||prox_f(z_2) - prox_f(z_1)|| \
                      ||prox_f(z_2) - prox_f(z_1)|| &le ||z_2 - z_1|| \
                      &&square
                      end{align}






                      share|cite|improve this answer











                      $endgroup$









                      • 1




                        $begingroup$
                        Please use 'operatorname{prox}'
                        $endgroup$
                        – Silvia Ghinassi
                        Aug 22 '16 at 14:59










                      • $begingroup$
                        Okay, thanks for the remark!
                        $endgroup$
                        – martinzellner
                        Aug 24 '16 at 9:52














                      1












                      1








                      1





                      $begingroup$

                      You can proof the non-expansiveness of the proximal operator as following:
                      $DeclareMathOperator{prox}{prox}$



                      Note that the proof is similar to the proof of the (firm) non-expansiveness of the projection operator [1].



                      You can imagine the the proximal operator as some kind of generalised projection operator.



                      Proof:
                      We pick two arbitrary points $z_1$, $z_2$.



                      begin{align}
                      (z_1 - prox_f(z_1))^top (x- prox_f(z_1)) &le 0 quad forall x in mathcal C \
                      end{align}



                      By definition of the proximal operator $prox_f(z_2)$ is in $mathcal C$.



                      The optimality condition for $prox_f(z_1)$ is:
                      begin{align}
                      (f_{prox_f(z_1)} + prox_f(z_1) - z_1)^top (prox_f (z_2) - prox_f (z_1)) &ge 0 label{eq:opt_cond_1}\
                      end{align}



                      and for $prox_f(z_2)$:
                      begin{align}
                      (f_{prox_f(z_2)} + prox_f(z_2) - z_2)^top (prox_f (z_1) - prox_f (z_2)) &ge 0 \
                      (z_2 - f_{prox_f(z_2)} - prox_f(z_2) - )^top (prox_f (z_2) - prox_f (z_1)) &ge 0 label{eq:opt_cond_2}
                      end{align}



                      The subgradient properties for $f_{prox_f(z_1)}$ and $f_{prox_f (z_2)}$ are:



                      begin{align}
                      f(prox_f(z_2)) - f(prox_f(z_1)) ge f_{prox_f(z_2)}^top (prox_f(z_2) - prox_f(z_1)) label{eq:subgrad_1}\
                      f(prox_f(z_1)) - f(prox_f(z_2)) ge f_{prox_f(z_1)}^top (prox_f(z_1) - prox_f(z_2)) label{eq:subgrad_2}\
                      end{align}
                      Adding the two subgradient properties results to:
                      begin{align}
                      0 ge (f_{prox_f(z_2)} - f_{prox_f(z_1)})^top (prox_f(z_2) - prox_f(z_1)) label{eq:subgrad_zero}
                      end{align}



                      Adding the two optimality conditions leads us to:
                      begin{align}
                      ((z_2 - z_1) + (f_{prox_f(z_1)} - f_{prox_f(z_2)}) + (prox_f(z_1) - prox_f(z_2)))^top (prox_f(z_2) - prox_f(z_1)) &ge 0 \
                      (- (z_2 - z_1) + (f_{prox_f(z_2)} - f_{prox_f(z_1)}) + (prox_f(z_2) - prox_f(z_1)))^top (prox_f(z_2) - prox_f(z_1)) &le 0
                      end{align}



                      We use that the subgradient properties to bound the subgradients by zero:
                      begin{align}
                      (prox_f(z_2) - prox_f(z_1))^top (prox_f(z_2) - prox_f(z_1)) &le ((z_2 - z_1) - (f_{prox_f(z_2)} - f_{prox_f(z_1)}))^top (prox_f(z_2) - prox_f(z_1)) \
                      (prox_f(z_2) - prox_f(z_1))^toprox_f (prox_f(z_2) - prox_f(z_1)) &le (z_2 - z_1)^top (prox_f(z_2) - prox_f(z_1))
                      end{align}
                      And use Cauchy-Schwarz:
                      begin{align}
                      ||prox_f(z_2) - prox_f(z_1)||^2 &le ||z_2 - z_1|| ||prox_f(z_2) - prox_f(z_1)|| \
                      ||prox_f(z_2) - prox_f(z_1)|| &le ||z_2 - z_1|| \
                      &&square
                      end{align}






                      share|cite|improve this answer











                      $endgroup$



                      You can proof the non-expansiveness of the proximal operator as following:
                      $DeclareMathOperator{prox}{prox}$



                      Note that the proof is similar to the proof of the (firm) non-expansiveness of the projection operator [1].



                      You can imagine the the proximal operator as some kind of generalised projection operator.



                      Proof:
                      We pick two arbitrary points $z_1$, $z_2$.



                      begin{align}
                      (z_1 - prox_f(z_1))^top (x- prox_f(z_1)) &le 0 quad forall x in mathcal C \
                      end{align}



                      By definition of the proximal operator $prox_f(z_2)$ is in $mathcal C$.



                      The optimality condition for $prox_f(z_1)$ is:
                      begin{align}
                      (f_{prox_f(z_1)} + prox_f(z_1) - z_1)^top (prox_f (z_2) - prox_f (z_1)) &ge 0 label{eq:opt_cond_1}\
                      end{align}



                      and for $prox_f(z_2)$:
                      begin{align}
                      (f_{prox_f(z_2)} + prox_f(z_2) - z_2)^top (prox_f (z_1) - prox_f (z_2)) &ge 0 \
                      (z_2 - f_{prox_f(z_2)} - prox_f(z_2) - )^top (prox_f (z_2) - prox_f (z_1)) &ge 0 label{eq:opt_cond_2}
                      end{align}



                      The subgradient properties for $f_{prox_f(z_1)}$ and $f_{prox_f (z_2)}$ are:



                      begin{align}
                      f(prox_f(z_2)) - f(prox_f(z_1)) ge f_{prox_f(z_2)}^top (prox_f(z_2) - prox_f(z_1)) label{eq:subgrad_1}\
                      f(prox_f(z_1)) - f(prox_f(z_2)) ge f_{prox_f(z_1)}^top (prox_f(z_1) - prox_f(z_2)) label{eq:subgrad_2}\
                      end{align}
                      Adding the two subgradient properties results to:
                      begin{align}
                      0 ge (f_{prox_f(z_2)} - f_{prox_f(z_1)})^top (prox_f(z_2) - prox_f(z_1)) label{eq:subgrad_zero}
                      end{align}



                      Adding the two optimality conditions leads us to:
                      begin{align}
                      ((z_2 - z_1) + (f_{prox_f(z_1)} - f_{prox_f(z_2)}) + (prox_f(z_1) - prox_f(z_2)))^top (prox_f(z_2) - prox_f(z_1)) &ge 0 \
                      (- (z_2 - z_1) + (f_{prox_f(z_2)} - f_{prox_f(z_1)}) + (prox_f(z_2) - prox_f(z_1)))^top (prox_f(z_2) - prox_f(z_1)) &le 0
                      end{align}



                      We use that the subgradient properties to bound the subgradients by zero:
                      begin{align}
                      (prox_f(z_2) - prox_f(z_1))^top (prox_f(z_2) - prox_f(z_1)) &le ((z_2 - z_1) - (f_{prox_f(z_2)} - f_{prox_f(z_1)}))^top (prox_f(z_2) - prox_f(z_1)) \
                      (prox_f(z_2) - prox_f(z_1))^toprox_f (prox_f(z_2) - prox_f(z_1)) &le (z_2 - z_1)^top (prox_f(z_2) - prox_f(z_1))
                      end{align}
                      And use Cauchy-Schwarz:
                      begin{align}
                      ||prox_f(z_2) - prox_f(z_1)||^2 &le ||z_2 - z_1|| ||prox_f(z_2) - prox_f(z_1)|| \
                      ||prox_f(z_2) - prox_f(z_1)|| &le ||z_2 - z_1|| \
                      &&square
                      end{align}







                      share|cite|improve this answer














                      share|cite|improve this answer



                      share|cite|improve this answer








                      edited Aug 24 '16 at 9:52

























                      answered Aug 22 '16 at 14:35









                      martinzellnermartinzellner

                      112




                      112








                      • 1




                        $begingroup$
                        Please use 'operatorname{prox}'
                        $endgroup$
                        – Silvia Ghinassi
                        Aug 22 '16 at 14:59










                      • $begingroup$
                        Okay, thanks for the remark!
                        $endgroup$
                        – martinzellner
                        Aug 24 '16 at 9:52














                      • 1




                        $begingroup$
                        Please use 'operatorname{prox}'
                        $endgroup$
                        – Silvia Ghinassi
                        Aug 22 '16 at 14:59










                      • $begingroup$
                        Okay, thanks for the remark!
                        $endgroup$
                        – martinzellner
                        Aug 24 '16 at 9:52








                      1




                      1




                      $begingroup$
                      Please use 'operatorname{prox}'
                      $endgroup$
                      – Silvia Ghinassi
                      Aug 22 '16 at 14:59




                      $begingroup$
                      Please use 'operatorname{prox}'
                      $endgroup$
                      – Silvia Ghinassi
                      Aug 22 '16 at 14:59












                      $begingroup$
                      Okay, thanks for the remark!
                      $endgroup$
                      – martinzellner
                      Aug 24 '16 at 9:52




                      $begingroup$
                      Okay, thanks for the remark!
                      $endgroup$
                      – martinzellner
                      Aug 24 '16 at 9:52











                      0












                      $begingroup$

                      I can explain the first question, the author said "if $prox_f$ were a contraction", which is not guaranteed. Banach fixed-point theorem tells us repeatedly applying contraction would find a (here, unique) fixed point. So if $prox_f$ is a contraction, then by repeatedly applying proximal operator will result a fixed point.






                      share|cite|improve this answer









                      $endgroup$


















                        0












                        $begingroup$

                        I can explain the first question, the author said "if $prox_f$ were a contraction", which is not guaranteed. Banach fixed-point theorem tells us repeatedly applying contraction would find a (here, unique) fixed point. So if $prox_f$ is a contraction, then by repeatedly applying proximal operator will result a fixed point.






                        share|cite|improve this answer









                        $endgroup$
















                          0












                          0








                          0





                          $begingroup$

                          I can explain the first question, the author said "if $prox_f$ were a contraction", which is not guaranteed. Banach fixed-point theorem tells us repeatedly applying contraction would find a (here, unique) fixed point. So if $prox_f$ is a contraction, then by repeatedly applying proximal operator will result a fixed point.






                          share|cite|improve this answer









                          $endgroup$



                          I can explain the first question, the author said "if $prox_f$ were a contraction", which is not guaranteed. Banach fixed-point theorem tells us repeatedly applying contraction would find a (here, unique) fixed point. So if $prox_f$ is a contraction, then by repeatedly applying proximal operator will result a fixed point.







                          share|cite|improve this answer












                          share|cite|improve this answer



                          share|cite|improve this answer










                          answered Jul 26 '15 at 8:44









                          chuanchuan

                          11




                          11






























                              draft saved

                              draft discarded




















































                              Thanks for contributing an answer to Mathematics Stack Exchange!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              Use MathJax to format equations. MathJax reference.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function () {
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f910803%2ffirm-non-expansiveness-in-the-context-of-proximal-operators%23new-answer', 'question_page');
                              }
                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              MongoDB - Not Authorized To Execute Command

                              How to fix TextFormField cause rebuild widget in Flutter

                              in spring boot 2.1 many test slices are not allowed anymore due to multiple @BootstrapWith