When will empirical risk minimization with inductive bias fail>
$begingroup$
I am working on the assignment and I am stucked on this problem:
Give an example of a class H, some domain space X, a distribution P over X × { 0, 1}, and an ERMH learning algorithm, A, such that for some h*∈ H, for every sample size m, h* is a much better hypothesis than the ERM one picked by A. That is, E[Lp(A(s)) >= Lp(h*) + 1/2]
I have no idea how to do that. Can anybody give me some hints?
machine-learning
$endgroup$
add a comment |
$begingroup$
I am working on the assignment and I am stucked on this problem:
Give an example of a class H, some domain space X, a distribution P over X × { 0, 1}, and an ERMH learning algorithm, A, such that for some h*∈ H, for every sample size m, h* is a much better hypothesis than the ERM one picked by A. That is, E[Lp(A(s)) >= Lp(h*) + 1/2]
I have no idea how to do that. Can anybody give me some hints?
machine-learning
$endgroup$
add a comment |
$begingroup$
I am working on the assignment and I am stucked on this problem:
Give an example of a class H, some domain space X, a distribution P over X × { 0, 1}, and an ERMH learning algorithm, A, such that for some h*∈ H, for every sample size m, h* is a much better hypothesis than the ERM one picked by A. That is, E[Lp(A(s)) >= Lp(h*) + 1/2]
I have no idea how to do that. Can anybody give me some hints?
machine-learning
$endgroup$
I am working on the assignment and I am stucked on this problem:
Give an example of a class H, some domain space X, a distribution P over X × { 0, 1}, and an ERMH learning algorithm, A, such that for some h*∈ H, for every sample size m, h* is a much better hypothesis than the ERM one picked by A. That is, E[Lp(A(s)) >= Lp(h*) + 1/2]
I have no idea how to do that. Can anybody give me some hints?
machine-learning
machine-learning
asked Jan 28 at 17:24


Zeng YongjianZeng Yongjian
133
133
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
The first observation is that if H has finite VC dimension , then empirical risk converge to the true risk as the sample size grows .Therefore ERM generates better and better hypothesis . This suggest to look for some set with infinite VC dimension . Regarding the distribution , to make things easy for us and hard for ERM , we should look for some uniform distribution.
An example that I think it works is the following :
- $X = [0,1]$
- H is the set of all functions $h:X rightarrow {0,1}$ . H clearly has infinite VC dimension
$c in H$ some arbitrary fixt function- P is the distribution induced by the uniform distribution over X and $c$
Since $c in H$ we can take $h^{*} = c$ , so the true risk is $L_{P}(h^{*}) = 0$
The sample always has a 0 measure support on X (it's a finit set) . The algorithm A will pick a hypothesis with 0 empirical risk , but since it does not know anything about the values that the function c has on the rest of the domain X , it can't perform better than random guessing and it's true risk will be 1/2 .
Let $P_{H}$ be the uniform distrubution over H -
$forall x in X ,fsim P_{H} , P(f(x)=0)= P(f(x)=1)=1/2$. For a sample $S$ we can define a new probability distribution $P_{HS}$ by conditioning on the values of the sample : $forall x_{i} in S f(x_{i}) = c(x_{i})$ .This ,combined with the fact that support(S) has 0 measure leads to $E_{xsim P}[E_{hsim P_{HS}}[A(S)(x) ne h(x)]] = 1/2$ . Therefore must exist at least a hypothesis h that agrees with the one provided by the algorithm A, but is differnent on the rest of the domain X . Letting c be this hypothesis, we have that $L_{P}(A(S)) = 1/2$
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3091151%2fwhen-will-empirical-risk-minimization-with-inductive-bias-fail%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
The first observation is that if H has finite VC dimension , then empirical risk converge to the true risk as the sample size grows .Therefore ERM generates better and better hypothesis . This suggest to look for some set with infinite VC dimension . Regarding the distribution , to make things easy for us and hard for ERM , we should look for some uniform distribution.
An example that I think it works is the following :
- $X = [0,1]$
- H is the set of all functions $h:X rightarrow {0,1}$ . H clearly has infinite VC dimension
$c in H$ some arbitrary fixt function- P is the distribution induced by the uniform distribution over X and $c$
Since $c in H$ we can take $h^{*} = c$ , so the true risk is $L_{P}(h^{*}) = 0$
The sample always has a 0 measure support on X (it's a finit set) . The algorithm A will pick a hypothesis with 0 empirical risk , but since it does not know anything about the values that the function c has on the rest of the domain X , it can't perform better than random guessing and it's true risk will be 1/2 .
Let $P_{H}$ be the uniform distrubution over H -
$forall x in X ,fsim P_{H} , P(f(x)=0)= P(f(x)=1)=1/2$. For a sample $S$ we can define a new probability distribution $P_{HS}$ by conditioning on the values of the sample : $forall x_{i} in S f(x_{i}) = c(x_{i})$ .This ,combined with the fact that support(S) has 0 measure leads to $E_{xsim P}[E_{hsim P_{HS}}[A(S)(x) ne h(x)]] = 1/2$ . Therefore must exist at least a hypothesis h that agrees with the one provided by the algorithm A, but is differnent on the rest of the domain X . Letting c be this hypothesis, we have that $L_{P}(A(S)) = 1/2$
$endgroup$
add a comment |
$begingroup$
The first observation is that if H has finite VC dimension , then empirical risk converge to the true risk as the sample size grows .Therefore ERM generates better and better hypothesis . This suggest to look for some set with infinite VC dimension . Regarding the distribution , to make things easy for us and hard for ERM , we should look for some uniform distribution.
An example that I think it works is the following :
- $X = [0,1]$
- H is the set of all functions $h:X rightarrow {0,1}$ . H clearly has infinite VC dimension
$c in H$ some arbitrary fixt function- P is the distribution induced by the uniform distribution over X and $c$
Since $c in H$ we can take $h^{*} = c$ , so the true risk is $L_{P}(h^{*}) = 0$
The sample always has a 0 measure support on X (it's a finit set) . The algorithm A will pick a hypothesis with 0 empirical risk , but since it does not know anything about the values that the function c has on the rest of the domain X , it can't perform better than random guessing and it's true risk will be 1/2 .
Let $P_{H}$ be the uniform distrubution over H -
$forall x in X ,fsim P_{H} , P(f(x)=0)= P(f(x)=1)=1/2$. For a sample $S$ we can define a new probability distribution $P_{HS}$ by conditioning on the values of the sample : $forall x_{i} in S f(x_{i}) = c(x_{i})$ .This ,combined with the fact that support(S) has 0 measure leads to $E_{xsim P}[E_{hsim P_{HS}}[A(S)(x) ne h(x)]] = 1/2$ . Therefore must exist at least a hypothesis h that agrees with the one provided by the algorithm A, but is differnent on the rest of the domain X . Letting c be this hypothesis, we have that $L_{P}(A(S)) = 1/2$
$endgroup$
add a comment |
$begingroup$
The first observation is that if H has finite VC dimension , then empirical risk converge to the true risk as the sample size grows .Therefore ERM generates better and better hypothesis . This suggest to look for some set with infinite VC dimension . Regarding the distribution , to make things easy for us and hard for ERM , we should look for some uniform distribution.
An example that I think it works is the following :
- $X = [0,1]$
- H is the set of all functions $h:X rightarrow {0,1}$ . H clearly has infinite VC dimension
$c in H$ some arbitrary fixt function- P is the distribution induced by the uniform distribution over X and $c$
Since $c in H$ we can take $h^{*} = c$ , so the true risk is $L_{P}(h^{*}) = 0$
The sample always has a 0 measure support on X (it's a finit set) . The algorithm A will pick a hypothesis with 0 empirical risk , but since it does not know anything about the values that the function c has on the rest of the domain X , it can't perform better than random guessing and it's true risk will be 1/2 .
Let $P_{H}$ be the uniform distrubution over H -
$forall x in X ,fsim P_{H} , P(f(x)=0)= P(f(x)=1)=1/2$. For a sample $S$ we can define a new probability distribution $P_{HS}$ by conditioning on the values of the sample : $forall x_{i} in S f(x_{i}) = c(x_{i})$ .This ,combined with the fact that support(S) has 0 measure leads to $E_{xsim P}[E_{hsim P_{HS}}[A(S)(x) ne h(x)]] = 1/2$ . Therefore must exist at least a hypothesis h that agrees with the one provided by the algorithm A, but is differnent on the rest of the domain X . Letting c be this hypothesis, we have that $L_{P}(A(S)) = 1/2$
$endgroup$
The first observation is that if H has finite VC dimension , then empirical risk converge to the true risk as the sample size grows .Therefore ERM generates better and better hypothesis . This suggest to look for some set with infinite VC dimension . Regarding the distribution , to make things easy for us and hard for ERM , we should look for some uniform distribution.
An example that I think it works is the following :
- $X = [0,1]$
- H is the set of all functions $h:X rightarrow {0,1}$ . H clearly has infinite VC dimension
$c in H$ some arbitrary fixt function- P is the distribution induced by the uniform distribution over X and $c$
Since $c in H$ we can take $h^{*} = c$ , so the true risk is $L_{P}(h^{*}) = 0$
The sample always has a 0 measure support on X (it's a finit set) . The algorithm A will pick a hypothesis with 0 empirical risk , but since it does not know anything about the values that the function c has on the rest of the domain X , it can't perform better than random guessing and it's true risk will be 1/2 .
Let $P_{H}$ be the uniform distrubution over H -
$forall x in X ,fsim P_{H} , P(f(x)=0)= P(f(x)=1)=1/2$. For a sample $S$ we can define a new probability distribution $P_{HS}$ by conditioning on the values of the sample : $forall x_{i} in S f(x_{i}) = c(x_{i})$ .This ,combined with the fact that support(S) has 0 measure leads to $E_{xsim P}[E_{hsim P_{HS}}[A(S)(x) ne h(x)]] = 1/2$ . Therefore must exist at least a hypothesis h that agrees with the one provided by the algorithm A, but is differnent on the rest of the domain X . Letting c be this hypothesis, we have that $L_{P}(A(S)) = 1/2$
edited Feb 3 at 14:40
answered Feb 3 at 11:12
Popescu ClaudiuPopescu Claudiu
4019
4019
add a comment |
add a comment |
Thanks for contributing an answer to Mathematics Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3091151%2fwhen-will-empirical-risk-minimization-with-inductive-bias-fail%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown