Proximal gradient method justification












1












$begingroup$


If $f$ and $g$ are respectively a differentiable function and a convex, lower semi-continuous function, then the algorithm defined by:



$$ x^{k+1} = text{prox}_{gamma{g}}[x^{k} - gammanabla{f(x^{k}})]$$



converges to $text{argmin}[f+g]$.



This is justified by the fact that if $x^{*}$ is a minimizer of $f+g$, then:
$$ x^{*} = text{prox}_{gamma{g}}[x^{*} - gammanabla{f(x^{*}})]$$



But I do not understand this relation. Why is it true?



That is, why $x^{*} = text{argmin}[f+g] Leftrightarrow x^{*} =text{prox}_{gamma{g}}[x^{*} - gamma nabla{f(x^{*}})] $ ?










share|cite|improve this question











$endgroup$

















    1












    $begingroup$


    If $f$ and $g$ are respectively a differentiable function and a convex, lower semi-continuous function, then the algorithm defined by:



    $$ x^{k+1} = text{prox}_{gamma{g}}[x^{k} - gammanabla{f(x^{k}})]$$



    converges to $text{argmin}[f+g]$.



    This is justified by the fact that if $x^{*}$ is a minimizer of $f+g$, then:
    $$ x^{*} = text{prox}_{gamma{g}}[x^{*} - gammanabla{f(x^{*}})]$$



    But I do not understand this relation. Why is it true?



    That is, why $x^{*} = text{argmin}[f+g] Leftrightarrow x^{*} =text{prox}_{gamma{g}}[x^{*} - gamma nabla{f(x^{*}})] $ ?










    share|cite|improve this question











    $endgroup$















      1












      1








      1





      $begingroup$


      If $f$ and $g$ are respectively a differentiable function and a convex, lower semi-continuous function, then the algorithm defined by:



      $$ x^{k+1} = text{prox}_{gamma{g}}[x^{k} - gammanabla{f(x^{k}})]$$



      converges to $text{argmin}[f+g]$.



      This is justified by the fact that if $x^{*}$ is a minimizer of $f+g$, then:
      $$ x^{*} = text{prox}_{gamma{g}}[x^{*} - gammanabla{f(x^{*}})]$$



      But I do not understand this relation. Why is it true?



      That is, why $x^{*} = text{argmin}[f+g] Leftrightarrow x^{*} =text{prox}_{gamma{g}}[x^{*} - gamma nabla{f(x^{*}})] $ ?










      share|cite|improve this question











      $endgroup$




      If $f$ and $g$ are respectively a differentiable function and a convex, lower semi-continuous function, then the algorithm defined by:



      $$ x^{k+1} = text{prox}_{gamma{g}}[x^{k} - gammanabla{f(x^{k}})]$$



      converges to $text{argmin}[f+g]$.



      This is justified by the fact that if $x^{*}$ is a minimizer of $f+g$, then:
      $$ x^{*} = text{prox}_{gamma{g}}[x^{*} - gammanabla{f(x^{*}})]$$



      But I do not understand this relation. Why is it true?



      That is, why $x^{*} = text{argmin}[f+g] Leftrightarrow x^{*} =text{prox}_{gamma{g}}[x^{*} - gamma nabla{f(x^{*}})] $ ?







      optimization convex-analysis gradient-descent semicontinuous-functions






      share|cite|improve this question















      share|cite|improve this question













      share|cite|improve this question




      share|cite|improve this question








      edited Jan 17 at 16:48









      user52705

      14810




      14810










      asked Jan 17 at 14:24









      AlbertoAlberto

      82




      82






















          2 Answers
          2






          active

          oldest

          votes


















          1












          $begingroup$

          I will show below that if $x^* = text{prox}_{gamma{g}}[x^{*} - gamma nabla{f(x^{*}})]$ then $x^* in text{argmin } f(x)+g(x) $.



          Plugging in the definition of proximal operator, we have
          $$text{prox}_{gamma{g}}[x^{*} - gamma nabla{f(x^{*}})] = text{argmin}_{x} left{gamma g(x) + frac{1}{2}|x- (x^* - gamma nabla f(x^*))|^2right}$$
          Now since $x^* = text{prox}_{gamma{g}}[x^{*} - gamma nabla{f(x^{*}})]$, we have by Fermat's rule at $x=x^*$, the following
          $$0 in partial (gamma g(x)) + (x-(x^*-gammanabla f(x^*)))$$
          Now just substitute $x=x^*$, we get
          $$0in gamma partial g(x^*) + gamma nabla f(x^*)$$
          This is equivalent to saying that $0 in partial F(x^*)$, where $F(x) = g(x)+f(x)$, so $x^*$ is the minimizer. The other direction of the proof is very similar.



          Note: The last step $0 in partial F(x^*)$ need not always hold (check subdifferential properties).






          share|cite|improve this answer











          $endgroup$





















            1












            $begingroup$

            Here's an explanation which assumes that we already understand the idea that the prox-operator of $g$ with parameter $t > 0$ is the operator $(I + t partial g)^{-1}$, where $partial g$ is the subdifferential of $g$.



            I'll assume that $f$ is convex as well as differentiable, and that $g$ is convex and closed. Let $t >0$. A point $x$ is a minimizer of $f + g$ if and only if
            begin{align}
            &0 in nabla f(x) + partial g(x) \
            iff &x in x + t nabla f(x) + tpartial g(x) \
            iff &x - t nabla f(x) in (I + t partial g)(x) \
            iff &x = (I + t partial g)^{-1}(x - t nabla f(x)).
            end{align}

            The final equation is another way of saying that
            $$
            x = text{prox}_{tg}(x - t nabla f(x)).
            $$



            We can then solve this equation using fixed point iteration, which yields the proximal gradient method.



            By the way, if this derivation of the proximal gradient method doesn't seem intuitive, there are other ways to discover the proximal gradient method that are more obvious. The viewpoint given here has the advantage that it shows that the proximal gradient method is a fixed point iteration, which helps us with convergence proofs.






            share|cite|improve this answer









            $endgroup$













              Your Answer





              StackExchange.ifUsing("editor", function () {
              return StackExchange.using("mathjaxEditing", function () {
              StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
              StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
              });
              });
              }, "mathjax-editing");

              StackExchange.ready(function() {
              var channelOptions = {
              tags: "".split(" "),
              id: "69"
              };
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function() {
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled) {
              StackExchange.using("snippets", function() {
              createEditor();
              });
              }
              else {
              createEditor();
              }
              });

              function createEditor() {
              StackExchange.prepareEditor({
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: true,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: 10,
              bindNavPrevention: true,
              postfix: "",
              imageUploader: {
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              },
              noCode: true, onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              });


              }
              });














              draft saved

              draft discarded


















              StackExchange.ready(
              function () {
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3077039%2fproximal-gradient-method-justification%23new-answer', 'question_page');
              }
              );

              Post as a guest















              Required, but never shown

























              2 Answers
              2






              active

              oldest

              votes








              2 Answers
              2






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              1












              $begingroup$

              I will show below that if $x^* = text{prox}_{gamma{g}}[x^{*} - gamma nabla{f(x^{*}})]$ then $x^* in text{argmin } f(x)+g(x) $.



              Plugging in the definition of proximal operator, we have
              $$text{prox}_{gamma{g}}[x^{*} - gamma nabla{f(x^{*}})] = text{argmin}_{x} left{gamma g(x) + frac{1}{2}|x- (x^* - gamma nabla f(x^*))|^2right}$$
              Now since $x^* = text{prox}_{gamma{g}}[x^{*} - gamma nabla{f(x^{*}})]$, we have by Fermat's rule at $x=x^*$, the following
              $$0 in partial (gamma g(x)) + (x-(x^*-gammanabla f(x^*)))$$
              Now just substitute $x=x^*$, we get
              $$0in gamma partial g(x^*) + gamma nabla f(x^*)$$
              This is equivalent to saying that $0 in partial F(x^*)$, where $F(x) = g(x)+f(x)$, so $x^*$ is the minimizer. The other direction of the proof is very similar.



              Note: The last step $0 in partial F(x^*)$ need not always hold (check subdifferential properties).






              share|cite|improve this answer











              $endgroup$


















                1












                $begingroup$

                I will show below that if $x^* = text{prox}_{gamma{g}}[x^{*} - gamma nabla{f(x^{*}})]$ then $x^* in text{argmin } f(x)+g(x) $.



                Plugging in the definition of proximal operator, we have
                $$text{prox}_{gamma{g}}[x^{*} - gamma nabla{f(x^{*}})] = text{argmin}_{x} left{gamma g(x) + frac{1}{2}|x- (x^* - gamma nabla f(x^*))|^2right}$$
                Now since $x^* = text{prox}_{gamma{g}}[x^{*} - gamma nabla{f(x^{*}})]$, we have by Fermat's rule at $x=x^*$, the following
                $$0 in partial (gamma g(x)) + (x-(x^*-gammanabla f(x^*)))$$
                Now just substitute $x=x^*$, we get
                $$0in gamma partial g(x^*) + gamma nabla f(x^*)$$
                This is equivalent to saying that $0 in partial F(x^*)$, where $F(x) = g(x)+f(x)$, so $x^*$ is the minimizer. The other direction of the proof is very similar.



                Note: The last step $0 in partial F(x^*)$ need not always hold (check subdifferential properties).






                share|cite|improve this answer











                $endgroup$
















                  1












                  1








                  1





                  $begingroup$

                  I will show below that if $x^* = text{prox}_{gamma{g}}[x^{*} - gamma nabla{f(x^{*}})]$ then $x^* in text{argmin } f(x)+g(x) $.



                  Plugging in the definition of proximal operator, we have
                  $$text{prox}_{gamma{g}}[x^{*} - gamma nabla{f(x^{*}})] = text{argmin}_{x} left{gamma g(x) + frac{1}{2}|x- (x^* - gamma nabla f(x^*))|^2right}$$
                  Now since $x^* = text{prox}_{gamma{g}}[x^{*} - gamma nabla{f(x^{*}})]$, we have by Fermat's rule at $x=x^*$, the following
                  $$0 in partial (gamma g(x)) + (x-(x^*-gammanabla f(x^*)))$$
                  Now just substitute $x=x^*$, we get
                  $$0in gamma partial g(x^*) + gamma nabla f(x^*)$$
                  This is equivalent to saying that $0 in partial F(x^*)$, where $F(x) = g(x)+f(x)$, so $x^*$ is the minimizer. The other direction of the proof is very similar.



                  Note: The last step $0 in partial F(x^*)$ need not always hold (check subdifferential properties).






                  share|cite|improve this answer











                  $endgroup$



                  I will show below that if $x^* = text{prox}_{gamma{g}}[x^{*} - gamma nabla{f(x^{*}})]$ then $x^* in text{argmin } f(x)+g(x) $.



                  Plugging in the definition of proximal operator, we have
                  $$text{prox}_{gamma{g}}[x^{*} - gamma nabla{f(x^{*}})] = text{argmin}_{x} left{gamma g(x) + frac{1}{2}|x- (x^* - gamma nabla f(x^*))|^2right}$$
                  Now since $x^* = text{prox}_{gamma{g}}[x^{*} - gamma nabla{f(x^{*}})]$, we have by Fermat's rule at $x=x^*$, the following
                  $$0 in partial (gamma g(x)) + (x-(x^*-gammanabla f(x^*)))$$
                  Now just substitute $x=x^*$, we get
                  $$0in gamma partial g(x^*) + gamma nabla f(x^*)$$
                  This is equivalent to saying that $0 in partial F(x^*)$, where $F(x) = g(x)+f(x)$, so $x^*$ is the minimizer. The other direction of the proof is very similar.



                  Note: The last step $0 in partial F(x^*)$ need not always hold (check subdifferential properties).







                  share|cite|improve this answer














                  share|cite|improve this answer



                  share|cite|improve this answer








                  edited Jan 18 at 16:16

























                  answered Jan 17 at 16:03









                  user52705user52705

                  14810




                  14810























                      1












                      $begingroup$

                      Here's an explanation which assumes that we already understand the idea that the prox-operator of $g$ with parameter $t > 0$ is the operator $(I + t partial g)^{-1}$, where $partial g$ is the subdifferential of $g$.



                      I'll assume that $f$ is convex as well as differentiable, and that $g$ is convex and closed. Let $t >0$. A point $x$ is a minimizer of $f + g$ if and only if
                      begin{align}
                      &0 in nabla f(x) + partial g(x) \
                      iff &x in x + t nabla f(x) + tpartial g(x) \
                      iff &x - t nabla f(x) in (I + t partial g)(x) \
                      iff &x = (I + t partial g)^{-1}(x - t nabla f(x)).
                      end{align}

                      The final equation is another way of saying that
                      $$
                      x = text{prox}_{tg}(x - t nabla f(x)).
                      $$



                      We can then solve this equation using fixed point iteration, which yields the proximal gradient method.



                      By the way, if this derivation of the proximal gradient method doesn't seem intuitive, there are other ways to discover the proximal gradient method that are more obvious. The viewpoint given here has the advantage that it shows that the proximal gradient method is a fixed point iteration, which helps us with convergence proofs.






                      share|cite|improve this answer









                      $endgroup$


















                        1












                        $begingroup$

                        Here's an explanation which assumes that we already understand the idea that the prox-operator of $g$ with parameter $t > 0$ is the operator $(I + t partial g)^{-1}$, where $partial g$ is the subdifferential of $g$.



                        I'll assume that $f$ is convex as well as differentiable, and that $g$ is convex and closed. Let $t >0$. A point $x$ is a minimizer of $f + g$ if and only if
                        begin{align}
                        &0 in nabla f(x) + partial g(x) \
                        iff &x in x + t nabla f(x) + tpartial g(x) \
                        iff &x - t nabla f(x) in (I + t partial g)(x) \
                        iff &x = (I + t partial g)^{-1}(x - t nabla f(x)).
                        end{align}

                        The final equation is another way of saying that
                        $$
                        x = text{prox}_{tg}(x - t nabla f(x)).
                        $$



                        We can then solve this equation using fixed point iteration, which yields the proximal gradient method.



                        By the way, if this derivation of the proximal gradient method doesn't seem intuitive, there are other ways to discover the proximal gradient method that are more obvious. The viewpoint given here has the advantage that it shows that the proximal gradient method is a fixed point iteration, which helps us with convergence proofs.






                        share|cite|improve this answer









                        $endgroup$
















                          1












                          1








                          1





                          $begingroup$

                          Here's an explanation which assumes that we already understand the idea that the prox-operator of $g$ with parameter $t > 0$ is the operator $(I + t partial g)^{-1}$, where $partial g$ is the subdifferential of $g$.



                          I'll assume that $f$ is convex as well as differentiable, and that $g$ is convex and closed. Let $t >0$. A point $x$ is a minimizer of $f + g$ if and only if
                          begin{align}
                          &0 in nabla f(x) + partial g(x) \
                          iff &x in x + t nabla f(x) + tpartial g(x) \
                          iff &x - t nabla f(x) in (I + t partial g)(x) \
                          iff &x = (I + t partial g)^{-1}(x - t nabla f(x)).
                          end{align}

                          The final equation is another way of saying that
                          $$
                          x = text{prox}_{tg}(x - t nabla f(x)).
                          $$



                          We can then solve this equation using fixed point iteration, which yields the proximal gradient method.



                          By the way, if this derivation of the proximal gradient method doesn't seem intuitive, there are other ways to discover the proximal gradient method that are more obvious. The viewpoint given here has the advantage that it shows that the proximal gradient method is a fixed point iteration, which helps us with convergence proofs.






                          share|cite|improve this answer









                          $endgroup$



                          Here's an explanation which assumes that we already understand the idea that the prox-operator of $g$ with parameter $t > 0$ is the operator $(I + t partial g)^{-1}$, where $partial g$ is the subdifferential of $g$.



                          I'll assume that $f$ is convex as well as differentiable, and that $g$ is convex and closed. Let $t >0$. A point $x$ is a minimizer of $f + g$ if and only if
                          begin{align}
                          &0 in nabla f(x) + partial g(x) \
                          iff &x in x + t nabla f(x) + tpartial g(x) \
                          iff &x - t nabla f(x) in (I + t partial g)(x) \
                          iff &x = (I + t partial g)^{-1}(x - t nabla f(x)).
                          end{align}

                          The final equation is another way of saying that
                          $$
                          x = text{prox}_{tg}(x - t nabla f(x)).
                          $$



                          We can then solve this equation using fixed point iteration, which yields the proximal gradient method.



                          By the way, if this derivation of the proximal gradient method doesn't seem intuitive, there are other ways to discover the proximal gradient method that are more obvious. The viewpoint given here has the advantage that it shows that the proximal gradient method is a fixed point iteration, which helps us with convergence proofs.







                          share|cite|improve this answer












                          share|cite|improve this answer



                          share|cite|improve this answer










                          answered Jan 17 at 17:31









                          littleOlittleO

                          29.9k646109




                          29.9k646109






























                              draft saved

                              draft discarded




















































                              Thanks for contributing an answer to Mathematics Stack Exchange!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              Use MathJax to format equations. MathJax reference.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function () {
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3077039%2fproximal-gradient-method-justification%23new-answer', 'question_page');
                              }
                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              Can a sorcerer learn a 5th-level spell early by creating spell slots using the Font of Magic feature?

                              Does disintegrating a polymorphed enemy still kill it after the 2018 errata?

                              A Topological Invariant for $pi_3(U(n))$