When will empirical risk minimization with inductive bias fail>












2












$begingroup$


I am working on the assignment and I am stucked on this problem:



Give an example of a class H, some domain space X, a distribution P over X × { 0, 1}, and an ERMH learning algorithm, A, such that for some h*∈ H, for every sample size m, h* is a much better hypothesis than the ERM one picked by A. That is, E[Lp(A(s)) >= Lp(h*) + 1/2]



I have no idea how to do that. Can anybody give me some hints?










share|cite|improve this question









$endgroup$

















    2












    $begingroup$


    I am working on the assignment and I am stucked on this problem:



    Give an example of a class H, some domain space X, a distribution P over X × { 0, 1}, and an ERMH learning algorithm, A, such that for some h*∈ H, for every sample size m, h* is a much better hypothesis than the ERM one picked by A. That is, E[Lp(A(s)) >= Lp(h*) + 1/2]



    I have no idea how to do that. Can anybody give me some hints?










    share|cite|improve this question









    $endgroup$















      2












      2








      2





      $begingroup$


      I am working on the assignment and I am stucked on this problem:



      Give an example of a class H, some domain space X, a distribution P over X × { 0, 1}, and an ERMH learning algorithm, A, such that for some h*∈ H, for every sample size m, h* is a much better hypothesis than the ERM one picked by A. That is, E[Lp(A(s)) >= Lp(h*) + 1/2]



      I have no idea how to do that. Can anybody give me some hints?










      share|cite|improve this question









      $endgroup$




      I am working on the assignment and I am stucked on this problem:



      Give an example of a class H, some domain space X, a distribution P over X × { 0, 1}, and an ERMH learning algorithm, A, such that for some h*∈ H, for every sample size m, h* is a much better hypothesis than the ERM one picked by A. That is, E[Lp(A(s)) >= Lp(h*) + 1/2]



      I have no idea how to do that. Can anybody give me some hints?







      machine-learning






      share|cite|improve this question













      share|cite|improve this question











      share|cite|improve this question




      share|cite|improve this question










      asked Jan 28 at 17:24









      Zeng YongjianZeng Yongjian

      133




      133






















          1 Answer
          1






          active

          oldest

          votes


















          0












          $begingroup$

          The first observation is that if H has finite VC dimension , then empirical risk converge to the true risk as the sample size grows .Therefore ERM generates better and better hypothesis . This suggest to look for some set with infinite VC dimension . Regarding the distribution , to make things easy for us and hard for ERM , we should look for some uniform distribution.



          An example that I think it works is the following :




          1. $X = [0,1]$

          2. H is the set of all functions $h:X rightarrow {0,1}$ . H clearly has infinite VC dimension


          3. $c in H$ some arbitrary fixt function

          4. P is the distribution induced by the uniform distribution over X and $c$


          Since $c in H$ we can take $h^{*} = c$ , so the true risk is $L_{P}(h^{*}) = 0$



          The sample always has a 0 measure support on X (it's a finit set) . The algorithm A will pick a hypothesis with 0 empirical risk , but since it does not know anything about the values that the function c has on the rest of the domain X , it can't perform better than random guessing and it's true risk will be 1/2 .



          Let $P_{H}$ be the uniform distrubution over H -
          $forall x in X ,fsim P_{H} , P(f(x)=0)= P(f(x)=1)=1/2$. For a sample $S$ we can define a new probability distribution $P_{HS}$ by conditioning on the values of the sample : $forall x_{i} in S f(x_{i}) = c(x_{i})$ .This ,combined with the fact that support(S) has 0 measure leads to $E_{xsim P}[E_{hsim P_{HS}}[A(S)(x) ne h(x)]] = 1/2$ . Therefore must exist at least a hypothesis h that agrees with the one provided by the algorithm A, but is differnent on the rest of the domain X . Letting c be this hypothesis, we have that $L_{P}(A(S)) = 1/2$






          share|cite|improve this answer











          $endgroup$














            Your Answer





            StackExchange.ifUsing("editor", function () {
            return StackExchange.using("mathjaxEditing", function () {
            StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
            StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
            });
            });
            }, "mathjax-editing");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "69"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            noCode: true, onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3091151%2fwhen-will-empirical-risk-minimization-with-inductive-bias-fail%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            0












            $begingroup$

            The first observation is that if H has finite VC dimension , then empirical risk converge to the true risk as the sample size grows .Therefore ERM generates better and better hypothesis . This suggest to look for some set with infinite VC dimension . Regarding the distribution , to make things easy for us and hard for ERM , we should look for some uniform distribution.



            An example that I think it works is the following :




            1. $X = [0,1]$

            2. H is the set of all functions $h:X rightarrow {0,1}$ . H clearly has infinite VC dimension


            3. $c in H$ some arbitrary fixt function

            4. P is the distribution induced by the uniform distribution over X and $c$


            Since $c in H$ we can take $h^{*} = c$ , so the true risk is $L_{P}(h^{*}) = 0$



            The sample always has a 0 measure support on X (it's a finit set) . The algorithm A will pick a hypothesis with 0 empirical risk , but since it does not know anything about the values that the function c has on the rest of the domain X , it can't perform better than random guessing and it's true risk will be 1/2 .



            Let $P_{H}$ be the uniform distrubution over H -
            $forall x in X ,fsim P_{H} , P(f(x)=0)= P(f(x)=1)=1/2$. For a sample $S$ we can define a new probability distribution $P_{HS}$ by conditioning on the values of the sample : $forall x_{i} in S f(x_{i}) = c(x_{i})$ .This ,combined with the fact that support(S) has 0 measure leads to $E_{xsim P}[E_{hsim P_{HS}}[A(S)(x) ne h(x)]] = 1/2$ . Therefore must exist at least a hypothesis h that agrees with the one provided by the algorithm A, but is differnent on the rest of the domain X . Letting c be this hypothesis, we have that $L_{P}(A(S)) = 1/2$






            share|cite|improve this answer











            $endgroup$


















              0












              $begingroup$

              The first observation is that if H has finite VC dimension , then empirical risk converge to the true risk as the sample size grows .Therefore ERM generates better and better hypothesis . This suggest to look for some set with infinite VC dimension . Regarding the distribution , to make things easy for us and hard for ERM , we should look for some uniform distribution.



              An example that I think it works is the following :




              1. $X = [0,1]$

              2. H is the set of all functions $h:X rightarrow {0,1}$ . H clearly has infinite VC dimension


              3. $c in H$ some arbitrary fixt function

              4. P is the distribution induced by the uniform distribution over X and $c$


              Since $c in H$ we can take $h^{*} = c$ , so the true risk is $L_{P}(h^{*}) = 0$



              The sample always has a 0 measure support on X (it's a finit set) . The algorithm A will pick a hypothesis with 0 empirical risk , but since it does not know anything about the values that the function c has on the rest of the domain X , it can't perform better than random guessing and it's true risk will be 1/2 .



              Let $P_{H}$ be the uniform distrubution over H -
              $forall x in X ,fsim P_{H} , P(f(x)=0)= P(f(x)=1)=1/2$. For a sample $S$ we can define a new probability distribution $P_{HS}$ by conditioning on the values of the sample : $forall x_{i} in S f(x_{i}) = c(x_{i})$ .This ,combined with the fact that support(S) has 0 measure leads to $E_{xsim P}[E_{hsim P_{HS}}[A(S)(x) ne h(x)]] = 1/2$ . Therefore must exist at least a hypothesis h that agrees with the one provided by the algorithm A, but is differnent on the rest of the domain X . Letting c be this hypothesis, we have that $L_{P}(A(S)) = 1/2$






              share|cite|improve this answer











              $endgroup$
















                0












                0








                0





                $begingroup$

                The first observation is that if H has finite VC dimension , then empirical risk converge to the true risk as the sample size grows .Therefore ERM generates better and better hypothesis . This suggest to look for some set with infinite VC dimension . Regarding the distribution , to make things easy for us and hard for ERM , we should look for some uniform distribution.



                An example that I think it works is the following :




                1. $X = [0,1]$

                2. H is the set of all functions $h:X rightarrow {0,1}$ . H clearly has infinite VC dimension


                3. $c in H$ some arbitrary fixt function

                4. P is the distribution induced by the uniform distribution over X and $c$


                Since $c in H$ we can take $h^{*} = c$ , so the true risk is $L_{P}(h^{*}) = 0$



                The sample always has a 0 measure support on X (it's a finit set) . The algorithm A will pick a hypothesis with 0 empirical risk , but since it does not know anything about the values that the function c has on the rest of the domain X , it can't perform better than random guessing and it's true risk will be 1/2 .



                Let $P_{H}$ be the uniform distrubution over H -
                $forall x in X ,fsim P_{H} , P(f(x)=0)= P(f(x)=1)=1/2$. For a sample $S$ we can define a new probability distribution $P_{HS}$ by conditioning on the values of the sample : $forall x_{i} in S f(x_{i}) = c(x_{i})$ .This ,combined with the fact that support(S) has 0 measure leads to $E_{xsim P}[E_{hsim P_{HS}}[A(S)(x) ne h(x)]] = 1/2$ . Therefore must exist at least a hypothesis h that agrees with the one provided by the algorithm A, but is differnent on the rest of the domain X . Letting c be this hypothesis, we have that $L_{P}(A(S)) = 1/2$






                share|cite|improve this answer











                $endgroup$



                The first observation is that if H has finite VC dimension , then empirical risk converge to the true risk as the sample size grows .Therefore ERM generates better and better hypothesis . This suggest to look for some set with infinite VC dimension . Regarding the distribution , to make things easy for us and hard for ERM , we should look for some uniform distribution.



                An example that I think it works is the following :




                1. $X = [0,1]$

                2. H is the set of all functions $h:X rightarrow {0,1}$ . H clearly has infinite VC dimension


                3. $c in H$ some arbitrary fixt function

                4. P is the distribution induced by the uniform distribution over X and $c$


                Since $c in H$ we can take $h^{*} = c$ , so the true risk is $L_{P}(h^{*}) = 0$



                The sample always has a 0 measure support on X (it's a finit set) . The algorithm A will pick a hypothesis with 0 empirical risk , but since it does not know anything about the values that the function c has on the rest of the domain X , it can't perform better than random guessing and it's true risk will be 1/2 .



                Let $P_{H}$ be the uniform distrubution over H -
                $forall x in X ,fsim P_{H} , P(f(x)=0)= P(f(x)=1)=1/2$. For a sample $S$ we can define a new probability distribution $P_{HS}$ by conditioning on the values of the sample : $forall x_{i} in S f(x_{i}) = c(x_{i})$ .This ,combined with the fact that support(S) has 0 measure leads to $E_{xsim P}[E_{hsim P_{HS}}[A(S)(x) ne h(x)]] = 1/2$ . Therefore must exist at least a hypothesis h that agrees with the one provided by the algorithm A, but is differnent on the rest of the domain X . Letting c be this hypothesis, we have that $L_{P}(A(S)) = 1/2$







                share|cite|improve this answer














                share|cite|improve this answer



                share|cite|improve this answer








                edited Feb 3 at 14:40

























                answered Feb 3 at 11:12









                Popescu ClaudiuPopescu Claudiu

                4019




                4019






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Mathematics Stack Exchange!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    Use MathJax to format equations. MathJax reference.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3091151%2fwhen-will-empirical-risk-minimization-with-inductive-bias-fail%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    MongoDB - Not Authorized To Execute Command

                    How to fix TextFormField cause rebuild widget in Flutter

                    in spring boot 2.1 many test slices are not allowed anymore due to multiple @BootstrapWith