Why can I write this function in terms of its previous values?
$begingroup$
From Understanding Machine Learning: From Theory to Algorithms
In the red box below, why is $F(theta ' ) = F(theta ) -D_i 1_{[y_i = 1]} + D_i 1_{[y_1=-1]}$?
proof-explanation machine-learning
$endgroup$
add a comment |
$begingroup$
From Understanding Machine Learning: From Theory to Algorithms
In the red box below, why is $F(theta ' ) = F(theta ) -D_i 1_{[y_i = 1]} + D_i 1_{[y_1=-1]}$?
proof-explanation machine-learning
$endgroup$
$begingroup$
Presumably, the term "value of the objective" has been defined before, so you should know what $F(theta)$ is. What is it?
$endgroup$
– David K
Jan 25 at 16:02
$begingroup$
The term has not been defined previously in the book in this context. This is the first time it appears. cs.huji.ac.il/~shais/UnderstandingMachineLearning/copy.html
$endgroup$
– Oliver G
Jan 25 at 16:07
$begingroup$
What about the word "objective"? The way it is used here, it seems you're supposed to understand what the function $F(theta)$ is before you read the equation.
$endgroup$
– David K
Jan 25 at 16:20
$begingroup$
I'm guessing it means the value of $(10.1)$ with $theta = theta$ instead of min$_{theta}$. But if I assume that, then I still don't see the inequality in my question.
$endgroup$
– Oliver G
Jan 25 at 16:23
add a comment |
$begingroup$
From Understanding Machine Learning: From Theory to Algorithms
In the red box below, why is $F(theta ' ) = F(theta ) -D_i 1_{[y_i = 1]} + D_i 1_{[y_1=-1]}$?
proof-explanation machine-learning
$endgroup$
From Understanding Machine Learning: From Theory to Algorithms
In the red box below, why is $F(theta ' ) = F(theta ) -D_i 1_{[y_i = 1]} + D_i 1_{[y_1=-1]}$?
proof-explanation machine-learning
proof-explanation machine-learning
edited Jan 25 at 15:55
Oliver G
asked Jan 25 at 15:20
Oliver GOliver G
1,4381632
1,4381632
$begingroup$
Presumably, the term "value of the objective" has been defined before, so you should know what $F(theta)$ is. What is it?
$endgroup$
– David K
Jan 25 at 16:02
$begingroup$
The term has not been defined previously in the book in this context. This is the first time it appears. cs.huji.ac.il/~shais/UnderstandingMachineLearning/copy.html
$endgroup$
– Oliver G
Jan 25 at 16:07
$begingroup$
What about the word "objective"? The way it is used here, it seems you're supposed to understand what the function $F(theta)$ is before you read the equation.
$endgroup$
– David K
Jan 25 at 16:20
$begingroup$
I'm guessing it means the value of $(10.1)$ with $theta = theta$ instead of min$_{theta}$. But if I assume that, then I still don't see the inequality in my question.
$endgroup$
– Oliver G
Jan 25 at 16:23
add a comment |
$begingroup$
Presumably, the term "value of the objective" has been defined before, so you should know what $F(theta)$ is. What is it?
$endgroup$
– David K
Jan 25 at 16:02
$begingroup$
The term has not been defined previously in the book in this context. This is the first time it appears. cs.huji.ac.il/~shais/UnderstandingMachineLearning/copy.html
$endgroup$
– Oliver G
Jan 25 at 16:07
$begingroup$
What about the word "objective"? The way it is used here, it seems you're supposed to understand what the function $F(theta)$ is before you read the equation.
$endgroup$
– David K
Jan 25 at 16:20
$begingroup$
I'm guessing it means the value of $(10.1)$ with $theta = theta$ instead of min$_{theta}$. But if I assume that, then I still don't see the inequality in my question.
$endgroup$
– Oliver G
Jan 25 at 16:23
$begingroup$
Presumably, the term "value of the objective" has been defined before, so you should know what $F(theta)$ is. What is it?
$endgroup$
– David K
Jan 25 at 16:02
$begingroup$
Presumably, the term "value of the objective" has been defined before, so you should know what $F(theta)$ is. What is it?
$endgroup$
– David K
Jan 25 at 16:02
$begingroup$
The term has not been defined previously in the book in this context. This is the first time it appears. cs.huji.ac.il/~shais/UnderstandingMachineLearning/copy.html
$endgroup$
– Oliver G
Jan 25 at 16:07
$begingroup$
The term has not been defined previously in the book in this context. This is the first time it appears. cs.huji.ac.il/~shais/UnderstandingMachineLearning/copy.html
$endgroup$
– Oliver G
Jan 25 at 16:07
$begingroup$
What about the word "objective"? The way it is used here, it seems you're supposed to understand what the function $F(theta)$ is before you read the equation.
$endgroup$
– David K
Jan 25 at 16:20
$begingroup$
What about the word "objective"? The way it is used here, it seems you're supposed to understand what the function $F(theta)$ is before you read the equation.
$endgroup$
– David K
Jan 25 at 16:20
$begingroup$
I'm guessing it means the value of $(10.1)$ with $theta = theta$ instead of min$_{theta}$. But if I assume that, then I still don't see the inequality in my question.
$endgroup$
– Oliver G
Jan 25 at 16:23
$begingroup$
I'm guessing it means the value of $(10.1)$ with $theta = theta$ instead of min$_{theta}$. But if I assume that, then I still don't see the inequality in my question.
$endgroup$
– Oliver G
Jan 25 at 16:23
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
As discussed in comments, let's define $F(theta)$ by
$$ F(theta) = sum_{i:y_i=-1} D_i mathbb{1}_{[x_{i,j}leqtheta]}
+ sum_{i:y_i=1} D_i mathbb{1}_{[x_{i,j}>theta]}.$$
I swapped the left and right sums here because it helped me understand the formula better. But addition is commutative, so it's the same function that we find inside the parentheses in $(10.1).$
As far as I can tell, what is going on here is that we have (temporarily) assigned a fixed value to $j$ (because we are in one iteration of the loop over $j$),
and then for each of the (sorted) values
$x_{1,j}, ldots, x_{i,j}, ldots, x_{m,j}$
we compute a value $y_i,$ which is either $1$ or $-1.$
I visualize this as $m$ points at coordinates $x_{1,j}, ldots, x_{m,j}$
arranged in that order from left to right along the number line
(because they have been sorted), with each of these $m$ points labeled either
$y_i=1$ or $y_i=-1.$
So $F(theta)$ is a weighted count (weighted by $D_i$) of all the times the label $y_i = -1$ occurs to the left of $theta$ (or exactly at $theta$) and all the times the label $y_i = 1$ occurs to the right of $theta.$
Since $theta in (x_{i-1,j},x_{i,j})$ and $theta in (x_{i,j},x_{i+1,j}),$
we know that $theta < x_{i,j} < theta',$ and we also know that $x_{i,j}$ is the only one of the "$x$" values between $theta$ and $theta'.$
So when we go from $F(theta)$ to $F(theta'),$ one of two things happens
as we pass over $x_{i,j}$:
If $y_i = -1,$ we get one more label of this kind on the left side of $theta'$ than on the left of $theta.$ Therefore the function value increases by $D_i,$ which is the weight at that point.
If $y_i = 1,$ we get one fewer label of this kind on the right side of $theta'$ than on the right of $theta.$ Therefore the function value decreases by $D_i,$ which is the weight at that point.
The notation $ - D_i mathbb{1}_{[y_i=1]} + D_i mathbb{1}_{[y_i=-1]}$
says the same thing: if $y_i=-1$ we add $D_i$, but if $y_i=1$ we subtract $D_i.$
If you work out the value of $-D_i y_i$ in both cases, it turns out to be the amount by which the function value changes as we go from $theta$ to $theta'.$
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3087210%2fwhy-can-i-write-this-function-in-terms-of-its-previous-values%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
As discussed in comments, let's define $F(theta)$ by
$$ F(theta) = sum_{i:y_i=-1} D_i mathbb{1}_{[x_{i,j}leqtheta]}
+ sum_{i:y_i=1} D_i mathbb{1}_{[x_{i,j}>theta]}.$$
I swapped the left and right sums here because it helped me understand the formula better. But addition is commutative, so it's the same function that we find inside the parentheses in $(10.1).$
As far as I can tell, what is going on here is that we have (temporarily) assigned a fixed value to $j$ (because we are in one iteration of the loop over $j$),
and then for each of the (sorted) values
$x_{1,j}, ldots, x_{i,j}, ldots, x_{m,j}$
we compute a value $y_i,$ which is either $1$ or $-1.$
I visualize this as $m$ points at coordinates $x_{1,j}, ldots, x_{m,j}$
arranged in that order from left to right along the number line
(because they have been sorted), with each of these $m$ points labeled either
$y_i=1$ or $y_i=-1.$
So $F(theta)$ is a weighted count (weighted by $D_i$) of all the times the label $y_i = -1$ occurs to the left of $theta$ (or exactly at $theta$) and all the times the label $y_i = 1$ occurs to the right of $theta.$
Since $theta in (x_{i-1,j},x_{i,j})$ and $theta in (x_{i,j},x_{i+1,j}),$
we know that $theta < x_{i,j} < theta',$ and we also know that $x_{i,j}$ is the only one of the "$x$" values between $theta$ and $theta'.$
So when we go from $F(theta)$ to $F(theta'),$ one of two things happens
as we pass over $x_{i,j}$:
If $y_i = -1,$ we get one more label of this kind on the left side of $theta'$ than on the left of $theta.$ Therefore the function value increases by $D_i,$ which is the weight at that point.
If $y_i = 1,$ we get one fewer label of this kind on the right side of $theta'$ than on the right of $theta.$ Therefore the function value decreases by $D_i,$ which is the weight at that point.
The notation $ - D_i mathbb{1}_{[y_i=1]} + D_i mathbb{1}_{[y_i=-1]}$
says the same thing: if $y_i=-1$ we add $D_i$, but if $y_i=1$ we subtract $D_i.$
If you work out the value of $-D_i y_i$ in both cases, it turns out to be the amount by which the function value changes as we go from $theta$ to $theta'.$
$endgroup$
add a comment |
$begingroup$
As discussed in comments, let's define $F(theta)$ by
$$ F(theta) = sum_{i:y_i=-1} D_i mathbb{1}_{[x_{i,j}leqtheta]}
+ sum_{i:y_i=1} D_i mathbb{1}_{[x_{i,j}>theta]}.$$
I swapped the left and right sums here because it helped me understand the formula better. But addition is commutative, so it's the same function that we find inside the parentheses in $(10.1).$
As far as I can tell, what is going on here is that we have (temporarily) assigned a fixed value to $j$ (because we are in one iteration of the loop over $j$),
and then for each of the (sorted) values
$x_{1,j}, ldots, x_{i,j}, ldots, x_{m,j}$
we compute a value $y_i,$ which is either $1$ or $-1.$
I visualize this as $m$ points at coordinates $x_{1,j}, ldots, x_{m,j}$
arranged in that order from left to right along the number line
(because they have been sorted), with each of these $m$ points labeled either
$y_i=1$ or $y_i=-1.$
So $F(theta)$ is a weighted count (weighted by $D_i$) of all the times the label $y_i = -1$ occurs to the left of $theta$ (or exactly at $theta$) and all the times the label $y_i = 1$ occurs to the right of $theta.$
Since $theta in (x_{i-1,j},x_{i,j})$ and $theta in (x_{i,j},x_{i+1,j}),$
we know that $theta < x_{i,j} < theta',$ and we also know that $x_{i,j}$ is the only one of the "$x$" values between $theta$ and $theta'.$
So when we go from $F(theta)$ to $F(theta'),$ one of two things happens
as we pass over $x_{i,j}$:
If $y_i = -1,$ we get one more label of this kind on the left side of $theta'$ than on the left of $theta.$ Therefore the function value increases by $D_i,$ which is the weight at that point.
If $y_i = 1,$ we get one fewer label of this kind on the right side of $theta'$ than on the right of $theta.$ Therefore the function value decreases by $D_i,$ which is the weight at that point.
The notation $ - D_i mathbb{1}_{[y_i=1]} + D_i mathbb{1}_{[y_i=-1]}$
says the same thing: if $y_i=-1$ we add $D_i$, but if $y_i=1$ we subtract $D_i.$
If you work out the value of $-D_i y_i$ in both cases, it turns out to be the amount by which the function value changes as we go from $theta$ to $theta'.$
$endgroup$
add a comment |
$begingroup$
As discussed in comments, let's define $F(theta)$ by
$$ F(theta) = sum_{i:y_i=-1} D_i mathbb{1}_{[x_{i,j}leqtheta]}
+ sum_{i:y_i=1} D_i mathbb{1}_{[x_{i,j}>theta]}.$$
I swapped the left and right sums here because it helped me understand the formula better. But addition is commutative, so it's the same function that we find inside the parentheses in $(10.1).$
As far as I can tell, what is going on here is that we have (temporarily) assigned a fixed value to $j$ (because we are in one iteration of the loop over $j$),
and then for each of the (sorted) values
$x_{1,j}, ldots, x_{i,j}, ldots, x_{m,j}$
we compute a value $y_i,$ which is either $1$ or $-1.$
I visualize this as $m$ points at coordinates $x_{1,j}, ldots, x_{m,j}$
arranged in that order from left to right along the number line
(because they have been sorted), with each of these $m$ points labeled either
$y_i=1$ or $y_i=-1.$
So $F(theta)$ is a weighted count (weighted by $D_i$) of all the times the label $y_i = -1$ occurs to the left of $theta$ (or exactly at $theta$) and all the times the label $y_i = 1$ occurs to the right of $theta.$
Since $theta in (x_{i-1,j},x_{i,j})$ and $theta in (x_{i,j},x_{i+1,j}),$
we know that $theta < x_{i,j} < theta',$ and we also know that $x_{i,j}$ is the only one of the "$x$" values between $theta$ and $theta'.$
So when we go from $F(theta)$ to $F(theta'),$ one of two things happens
as we pass over $x_{i,j}$:
If $y_i = -1,$ we get one more label of this kind on the left side of $theta'$ than on the left of $theta.$ Therefore the function value increases by $D_i,$ which is the weight at that point.
If $y_i = 1,$ we get one fewer label of this kind on the right side of $theta'$ than on the right of $theta.$ Therefore the function value decreases by $D_i,$ which is the weight at that point.
The notation $ - D_i mathbb{1}_{[y_i=1]} + D_i mathbb{1}_{[y_i=-1]}$
says the same thing: if $y_i=-1$ we add $D_i$, but if $y_i=1$ we subtract $D_i.$
If you work out the value of $-D_i y_i$ in both cases, it turns out to be the amount by which the function value changes as we go from $theta$ to $theta'.$
$endgroup$
As discussed in comments, let's define $F(theta)$ by
$$ F(theta) = sum_{i:y_i=-1} D_i mathbb{1}_{[x_{i,j}leqtheta]}
+ sum_{i:y_i=1} D_i mathbb{1}_{[x_{i,j}>theta]}.$$
I swapped the left and right sums here because it helped me understand the formula better. But addition is commutative, so it's the same function that we find inside the parentheses in $(10.1).$
As far as I can tell, what is going on here is that we have (temporarily) assigned a fixed value to $j$ (because we are in one iteration of the loop over $j$),
and then for each of the (sorted) values
$x_{1,j}, ldots, x_{i,j}, ldots, x_{m,j}$
we compute a value $y_i,$ which is either $1$ or $-1.$
I visualize this as $m$ points at coordinates $x_{1,j}, ldots, x_{m,j}$
arranged in that order from left to right along the number line
(because they have been sorted), with each of these $m$ points labeled either
$y_i=1$ or $y_i=-1.$
So $F(theta)$ is a weighted count (weighted by $D_i$) of all the times the label $y_i = -1$ occurs to the left of $theta$ (or exactly at $theta$) and all the times the label $y_i = 1$ occurs to the right of $theta.$
Since $theta in (x_{i-1,j},x_{i,j})$ and $theta in (x_{i,j},x_{i+1,j}),$
we know that $theta < x_{i,j} < theta',$ and we also know that $x_{i,j}$ is the only one of the "$x$" values between $theta$ and $theta'.$
So when we go from $F(theta)$ to $F(theta'),$ one of two things happens
as we pass over $x_{i,j}$:
If $y_i = -1,$ we get one more label of this kind on the left side of $theta'$ than on the left of $theta.$ Therefore the function value increases by $D_i,$ which is the weight at that point.
If $y_i = 1,$ we get one fewer label of this kind on the right side of $theta'$ than on the right of $theta.$ Therefore the function value decreases by $D_i,$ which is the weight at that point.
The notation $ - D_i mathbb{1}_{[y_i=1]} + D_i mathbb{1}_{[y_i=-1]}$
says the same thing: if $y_i=-1$ we add $D_i$, but if $y_i=1$ we subtract $D_i.$
If you work out the value of $-D_i y_i$ in both cases, it turns out to be the amount by which the function value changes as we go from $theta$ to $theta'.$
edited Jan 26 at 13:08
answered Jan 26 at 1:48
David KDavid K
55.2k344120
55.2k344120
add a comment |
add a comment |
Thanks for contributing an answer to Mathematics Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3087210%2fwhy-can-i-write-this-function-in-terms-of-its-previous-values%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
Presumably, the term "value of the objective" has been defined before, so you should know what $F(theta)$ is. What is it?
$endgroup$
– David K
Jan 25 at 16:02
$begingroup$
The term has not been defined previously in the book in this context. This is the first time it appears. cs.huji.ac.il/~shais/UnderstandingMachineLearning/copy.html
$endgroup$
– Oliver G
Jan 25 at 16:07
$begingroup$
What about the word "objective"? The way it is used here, it seems you're supposed to understand what the function $F(theta)$ is before you read the equation.
$endgroup$
– David K
Jan 25 at 16:20
$begingroup$
I'm guessing it means the value of $(10.1)$ with $theta = theta$ instead of min$_{theta}$. But if I assume that, then I still don't see the inequality in my question.
$endgroup$
– Oliver G
Jan 25 at 16:23