Why can I write this function in terms of its previous values?












1












$begingroup$


From Understanding Machine Learning: From Theory to Algorithms



In the red box below, why is $F(theta ' ) = F(theta ) -D_i 1_{[y_i = 1]} + D_i 1_{[y_1=-1]}$?





enter image description here










share|cite|improve this question











$endgroup$












  • $begingroup$
    Presumably, the term "value of the objective" has been defined before, so you should know what $F(theta)$ is. What is it?
    $endgroup$
    – David K
    Jan 25 at 16:02










  • $begingroup$
    The term has not been defined previously in the book in this context. This is the first time it appears. cs.huji.ac.il/~shais/UnderstandingMachineLearning/copy.html
    $endgroup$
    – Oliver G
    Jan 25 at 16:07












  • $begingroup$
    What about the word "objective"? The way it is used here, it seems you're supposed to understand what the function $F(theta)$ is before you read the equation.
    $endgroup$
    – David K
    Jan 25 at 16:20










  • $begingroup$
    I'm guessing it means the value of $(10.1)$ with $theta = theta$ instead of min$_{theta}$. But if I assume that, then I still don't see the inequality in my question.
    $endgroup$
    – Oliver G
    Jan 25 at 16:23


















1












$begingroup$


From Understanding Machine Learning: From Theory to Algorithms



In the red box below, why is $F(theta ' ) = F(theta ) -D_i 1_{[y_i = 1]} + D_i 1_{[y_1=-1]}$?





enter image description here










share|cite|improve this question











$endgroup$












  • $begingroup$
    Presumably, the term "value of the objective" has been defined before, so you should know what $F(theta)$ is. What is it?
    $endgroup$
    – David K
    Jan 25 at 16:02










  • $begingroup$
    The term has not been defined previously in the book in this context. This is the first time it appears. cs.huji.ac.il/~shais/UnderstandingMachineLearning/copy.html
    $endgroup$
    – Oliver G
    Jan 25 at 16:07












  • $begingroup$
    What about the word "objective"? The way it is used here, it seems you're supposed to understand what the function $F(theta)$ is before you read the equation.
    $endgroup$
    – David K
    Jan 25 at 16:20










  • $begingroup$
    I'm guessing it means the value of $(10.1)$ with $theta = theta$ instead of min$_{theta}$. But if I assume that, then I still don't see the inequality in my question.
    $endgroup$
    – Oliver G
    Jan 25 at 16:23
















1












1








1


0



$begingroup$


From Understanding Machine Learning: From Theory to Algorithms



In the red box below, why is $F(theta ' ) = F(theta ) -D_i 1_{[y_i = 1]} + D_i 1_{[y_1=-1]}$?





enter image description here










share|cite|improve this question











$endgroup$




From Understanding Machine Learning: From Theory to Algorithms



In the red box below, why is $F(theta ' ) = F(theta ) -D_i 1_{[y_i = 1]} + D_i 1_{[y_1=-1]}$?





enter image description here







proof-explanation machine-learning






share|cite|improve this question















share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited Jan 25 at 15:55







Oliver G

















asked Jan 25 at 15:20









Oliver GOliver G

1,4381632




1,4381632












  • $begingroup$
    Presumably, the term "value of the objective" has been defined before, so you should know what $F(theta)$ is. What is it?
    $endgroup$
    – David K
    Jan 25 at 16:02










  • $begingroup$
    The term has not been defined previously in the book in this context. This is the first time it appears. cs.huji.ac.il/~shais/UnderstandingMachineLearning/copy.html
    $endgroup$
    – Oliver G
    Jan 25 at 16:07












  • $begingroup$
    What about the word "objective"? The way it is used here, it seems you're supposed to understand what the function $F(theta)$ is before you read the equation.
    $endgroup$
    – David K
    Jan 25 at 16:20










  • $begingroup$
    I'm guessing it means the value of $(10.1)$ with $theta = theta$ instead of min$_{theta}$. But if I assume that, then I still don't see the inequality in my question.
    $endgroup$
    – Oliver G
    Jan 25 at 16:23




















  • $begingroup$
    Presumably, the term "value of the objective" has been defined before, so you should know what $F(theta)$ is. What is it?
    $endgroup$
    – David K
    Jan 25 at 16:02










  • $begingroup$
    The term has not been defined previously in the book in this context. This is the first time it appears. cs.huji.ac.il/~shais/UnderstandingMachineLearning/copy.html
    $endgroup$
    – Oliver G
    Jan 25 at 16:07












  • $begingroup$
    What about the word "objective"? The way it is used here, it seems you're supposed to understand what the function $F(theta)$ is before you read the equation.
    $endgroup$
    – David K
    Jan 25 at 16:20










  • $begingroup$
    I'm guessing it means the value of $(10.1)$ with $theta = theta$ instead of min$_{theta}$. But if I assume that, then I still don't see the inequality in my question.
    $endgroup$
    – Oliver G
    Jan 25 at 16:23


















$begingroup$
Presumably, the term "value of the objective" has been defined before, so you should know what $F(theta)$ is. What is it?
$endgroup$
– David K
Jan 25 at 16:02




$begingroup$
Presumably, the term "value of the objective" has been defined before, so you should know what $F(theta)$ is. What is it?
$endgroup$
– David K
Jan 25 at 16:02












$begingroup$
The term has not been defined previously in the book in this context. This is the first time it appears. cs.huji.ac.il/~shais/UnderstandingMachineLearning/copy.html
$endgroup$
– Oliver G
Jan 25 at 16:07






$begingroup$
The term has not been defined previously in the book in this context. This is the first time it appears. cs.huji.ac.il/~shais/UnderstandingMachineLearning/copy.html
$endgroup$
– Oliver G
Jan 25 at 16:07














$begingroup$
What about the word "objective"? The way it is used here, it seems you're supposed to understand what the function $F(theta)$ is before you read the equation.
$endgroup$
– David K
Jan 25 at 16:20




$begingroup$
What about the word "objective"? The way it is used here, it seems you're supposed to understand what the function $F(theta)$ is before you read the equation.
$endgroup$
– David K
Jan 25 at 16:20












$begingroup$
I'm guessing it means the value of $(10.1)$ with $theta = theta$ instead of min$_{theta}$. But if I assume that, then I still don't see the inequality in my question.
$endgroup$
– Oliver G
Jan 25 at 16:23






$begingroup$
I'm guessing it means the value of $(10.1)$ with $theta = theta$ instead of min$_{theta}$. But if I assume that, then I still don't see the inequality in my question.
$endgroup$
– Oliver G
Jan 25 at 16:23












1 Answer
1






active

oldest

votes


















1












$begingroup$

As discussed in comments, let's define $F(theta)$ by



$$ F(theta) = sum_{i:y_i=-1} D_i mathbb{1}_{[x_{i,j}leqtheta]}
+ sum_{i:y_i=1} D_i mathbb{1}_{[x_{i,j}>theta]}.$$



I swapped the left and right sums here because it helped me understand the formula better. But addition is commutative, so it's the same function that we find inside the parentheses in $(10.1).$



As far as I can tell, what is going on here is that we have (temporarily) assigned a fixed value to $j$ (because we are in one iteration of the loop over $j$),
and then for each of the (sorted) values
$x_{1,j}, ldots, x_{i,j}, ldots, x_{m,j}$
we compute a value $y_i,$ which is either $1$ or $-1.$



I visualize this as $m$ points at coordinates $x_{1,j}, ldots, x_{m,j}$
arranged in that order from left to right along the number line
(because they have been sorted), with each of these $m$ points labeled either
$y_i=1$ or $y_i=-1.$



So $F(theta)$ is a weighted count (weighted by $D_i$) of all the times the label $y_i = -1$ occurs to the left of $theta$ (or exactly at $theta$) and all the times the label $y_i = 1$ occurs to the right of $theta.$



Since $theta in (x_{i-1,j},x_{i,j})$ and $theta in (x_{i,j},x_{i+1,j}),$
we know that $theta < x_{i,j} < theta',$ and we also know that $x_{i,j}$ is the only one of the "$x$" values between $theta$ and $theta'.$
So when we go from $F(theta)$ to $F(theta'),$ one of two things happens
as we pass over $x_{i,j}$:




  • If $y_i = -1,$ we get one more label of this kind on the left side of $theta'$ than on the left of $theta.$ Therefore the function value increases by $D_i,$ which is the weight at that point.


  • If $y_i = 1,$ we get one fewer label of this kind on the right side of $theta'$ than on the right of $theta.$ Therefore the function value decreases by $D_i,$ which is the weight at that point.



The notation $ - D_i mathbb{1}_{[y_i=1]} + D_i mathbb{1}_{[y_i=-1]}$
says the same thing: if $y_i=-1$ we add $D_i$, but if $y_i=1$ we subtract $D_i.$



If you work out the value of $-D_i y_i$ in both cases, it turns out to be the amount by which the function value changes as we go from $theta$ to $theta'.$






share|cite|improve this answer











$endgroup$













    Your Answer





    StackExchange.ifUsing("editor", function () {
    return StackExchange.using("mathjaxEditing", function () {
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    });
    });
    }, "mathjax-editing");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "69"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    noCode: true, onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3087210%2fwhy-can-i-write-this-function-in-terms-of-its-previous-values%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1












    $begingroup$

    As discussed in comments, let's define $F(theta)$ by



    $$ F(theta) = sum_{i:y_i=-1} D_i mathbb{1}_{[x_{i,j}leqtheta]}
    + sum_{i:y_i=1} D_i mathbb{1}_{[x_{i,j}>theta]}.$$



    I swapped the left and right sums here because it helped me understand the formula better. But addition is commutative, so it's the same function that we find inside the parentheses in $(10.1).$



    As far as I can tell, what is going on here is that we have (temporarily) assigned a fixed value to $j$ (because we are in one iteration of the loop over $j$),
    and then for each of the (sorted) values
    $x_{1,j}, ldots, x_{i,j}, ldots, x_{m,j}$
    we compute a value $y_i,$ which is either $1$ or $-1.$



    I visualize this as $m$ points at coordinates $x_{1,j}, ldots, x_{m,j}$
    arranged in that order from left to right along the number line
    (because they have been sorted), with each of these $m$ points labeled either
    $y_i=1$ or $y_i=-1.$



    So $F(theta)$ is a weighted count (weighted by $D_i$) of all the times the label $y_i = -1$ occurs to the left of $theta$ (or exactly at $theta$) and all the times the label $y_i = 1$ occurs to the right of $theta.$



    Since $theta in (x_{i-1,j},x_{i,j})$ and $theta in (x_{i,j},x_{i+1,j}),$
    we know that $theta < x_{i,j} < theta',$ and we also know that $x_{i,j}$ is the only one of the "$x$" values between $theta$ and $theta'.$
    So when we go from $F(theta)$ to $F(theta'),$ one of two things happens
    as we pass over $x_{i,j}$:




    • If $y_i = -1,$ we get one more label of this kind on the left side of $theta'$ than on the left of $theta.$ Therefore the function value increases by $D_i,$ which is the weight at that point.


    • If $y_i = 1,$ we get one fewer label of this kind on the right side of $theta'$ than on the right of $theta.$ Therefore the function value decreases by $D_i,$ which is the weight at that point.



    The notation $ - D_i mathbb{1}_{[y_i=1]} + D_i mathbb{1}_{[y_i=-1]}$
    says the same thing: if $y_i=-1$ we add $D_i$, but if $y_i=1$ we subtract $D_i.$



    If you work out the value of $-D_i y_i$ in both cases, it turns out to be the amount by which the function value changes as we go from $theta$ to $theta'.$






    share|cite|improve this answer











    $endgroup$


















      1












      $begingroup$

      As discussed in comments, let's define $F(theta)$ by



      $$ F(theta) = sum_{i:y_i=-1} D_i mathbb{1}_{[x_{i,j}leqtheta]}
      + sum_{i:y_i=1} D_i mathbb{1}_{[x_{i,j}>theta]}.$$



      I swapped the left and right sums here because it helped me understand the formula better. But addition is commutative, so it's the same function that we find inside the parentheses in $(10.1).$



      As far as I can tell, what is going on here is that we have (temporarily) assigned a fixed value to $j$ (because we are in one iteration of the loop over $j$),
      and then for each of the (sorted) values
      $x_{1,j}, ldots, x_{i,j}, ldots, x_{m,j}$
      we compute a value $y_i,$ which is either $1$ or $-1.$



      I visualize this as $m$ points at coordinates $x_{1,j}, ldots, x_{m,j}$
      arranged in that order from left to right along the number line
      (because they have been sorted), with each of these $m$ points labeled either
      $y_i=1$ or $y_i=-1.$



      So $F(theta)$ is a weighted count (weighted by $D_i$) of all the times the label $y_i = -1$ occurs to the left of $theta$ (or exactly at $theta$) and all the times the label $y_i = 1$ occurs to the right of $theta.$



      Since $theta in (x_{i-1,j},x_{i,j})$ and $theta in (x_{i,j},x_{i+1,j}),$
      we know that $theta < x_{i,j} < theta',$ and we also know that $x_{i,j}$ is the only one of the "$x$" values between $theta$ and $theta'.$
      So when we go from $F(theta)$ to $F(theta'),$ one of two things happens
      as we pass over $x_{i,j}$:




      • If $y_i = -1,$ we get one more label of this kind on the left side of $theta'$ than on the left of $theta.$ Therefore the function value increases by $D_i,$ which is the weight at that point.


      • If $y_i = 1,$ we get one fewer label of this kind on the right side of $theta'$ than on the right of $theta.$ Therefore the function value decreases by $D_i,$ which is the weight at that point.



      The notation $ - D_i mathbb{1}_{[y_i=1]} + D_i mathbb{1}_{[y_i=-1]}$
      says the same thing: if $y_i=-1$ we add $D_i$, but if $y_i=1$ we subtract $D_i.$



      If you work out the value of $-D_i y_i$ in both cases, it turns out to be the amount by which the function value changes as we go from $theta$ to $theta'.$






      share|cite|improve this answer











      $endgroup$
















        1












        1








        1





        $begingroup$

        As discussed in comments, let's define $F(theta)$ by



        $$ F(theta) = sum_{i:y_i=-1} D_i mathbb{1}_{[x_{i,j}leqtheta]}
        + sum_{i:y_i=1} D_i mathbb{1}_{[x_{i,j}>theta]}.$$



        I swapped the left and right sums here because it helped me understand the formula better. But addition is commutative, so it's the same function that we find inside the parentheses in $(10.1).$



        As far as I can tell, what is going on here is that we have (temporarily) assigned a fixed value to $j$ (because we are in one iteration of the loop over $j$),
        and then for each of the (sorted) values
        $x_{1,j}, ldots, x_{i,j}, ldots, x_{m,j}$
        we compute a value $y_i,$ which is either $1$ or $-1.$



        I visualize this as $m$ points at coordinates $x_{1,j}, ldots, x_{m,j}$
        arranged in that order from left to right along the number line
        (because they have been sorted), with each of these $m$ points labeled either
        $y_i=1$ or $y_i=-1.$



        So $F(theta)$ is a weighted count (weighted by $D_i$) of all the times the label $y_i = -1$ occurs to the left of $theta$ (or exactly at $theta$) and all the times the label $y_i = 1$ occurs to the right of $theta.$



        Since $theta in (x_{i-1,j},x_{i,j})$ and $theta in (x_{i,j},x_{i+1,j}),$
        we know that $theta < x_{i,j} < theta',$ and we also know that $x_{i,j}$ is the only one of the "$x$" values between $theta$ and $theta'.$
        So when we go from $F(theta)$ to $F(theta'),$ one of two things happens
        as we pass over $x_{i,j}$:




        • If $y_i = -1,$ we get one more label of this kind on the left side of $theta'$ than on the left of $theta.$ Therefore the function value increases by $D_i,$ which is the weight at that point.


        • If $y_i = 1,$ we get one fewer label of this kind on the right side of $theta'$ than on the right of $theta.$ Therefore the function value decreases by $D_i,$ which is the weight at that point.



        The notation $ - D_i mathbb{1}_{[y_i=1]} + D_i mathbb{1}_{[y_i=-1]}$
        says the same thing: if $y_i=-1$ we add $D_i$, but if $y_i=1$ we subtract $D_i.$



        If you work out the value of $-D_i y_i$ in both cases, it turns out to be the amount by which the function value changes as we go from $theta$ to $theta'.$






        share|cite|improve this answer











        $endgroup$



        As discussed in comments, let's define $F(theta)$ by



        $$ F(theta) = sum_{i:y_i=-1} D_i mathbb{1}_{[x_{i,j}leqtheta]}
        + sum_{i:y_i=1} D_i mathbb{1}_{[x_{i,j}>theta]}.$$



        I swapped the left and right sums here because it helped me understand the formula better. But addition is commutative, so it's the same function that we find inside the parentheses in $(10.1).$



        As far as I can tell, what is going on here is that we have (temporarily) assigned a fixed value to $j$ (because we are in one iteration of the loop over $j$),
        and then for each of the (sorted) values
        $x_{1,j}, ldots, x_{i,j}, ldots, x_{m,j}$
        we compute a value $y_i,$ which is either $1$ or $-1.$



        I visualize this as $m$ points at coordinates $x_{1,j}, ldots, x_{m,j}$
        arranged in that order from left to right along the number line
        (because they have been sorted), with each of these $m$ points labeled either
        $y_i=1$ or $y_i=-1.$



        So $F(theta)$ is a weighted count (weighted by $D_i$) of all the times the label $y_i = -1$ occurs to the left of $theta$ (or exactly at $theta$) and all the times the label $y_i = 1$ occurs to the right of $theta.$



        Since $theta in (x_{i-1,j},x_{i,j})$ and $theta in (x_{i,j},x_{i+1,j}),$
        we know that $theta < x_{i,j} < theta',$ and we also know that $x_{i,j}$ is the only one of the "$x$" values between $theta$ and $theta'.$
        So when we go from $F(theta)$ to $F(theta'),$ one of two things happens
        as we pass over $x_{i,j}$:




        • If $y_i = -1,$ we get one more label of this kind on the left side of $theta'$ than on the left of $theta.$ Therefore the function value increases by $D_i,$ which is the weight at that point.


        • If $y_i = 1,$ we get one fewer label of this kind on the right side of $theta'$ than on the right of $theta.$ Therefore the function value decreases by $D_i,$ which is the weight at that point.



        The notation $ - D_i mathbb{1}_{[y_i=1]} + D_i mathbb{1}_{[y_i=-1]}$
        says the same thing: if $y_i=-1$ we add $D_i$, but if $y_i=1$ we subtract $D_i.$



        If you work out the value of $-D_i y_i$ in both cases, it turns out to be the amount by which the function value changes as we go from $theta$ to $theta'.$







        share|cite|improve this answer














        share|cite|improve this answer



        share|cite|improve this answer








        edited Jan 26 at 13:08

























        answered Jan 26 at 1:48









        David KDavid K

        55.2k344120




        55.2k344120






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Mathematics Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3087210%2fwhy-can-i-write-this-function-in-terms-of-its-previous-values%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            MongoDB - Not Authorized To Execute Command

            How to fix TextFormField cause rebuild widget in Flutter

            Npm cannot find a required file even through it is in the searched directory