Why does non-parametric bootstrap not return the same sample over and over again?

Why does non-parametric bootstrap not return the same sample over and over again?

6

Why does non-parametric bootstrap not return the same sample over and over again?

My notes write:

Assume data $X_1,...,X_n$.

Sample data with replacement to produce $X_1^{(p)},...,X_n^{(p)}$

Now since both are length $n$, then how does this not produce always the same sample? I'm missing something.

asked Nov 21 '18 at 10:36

mavavilj

1,220724

add a comment |

6

Why does non-parametric bootstrap not return the same sample over and over again?

My notes write:

Assume data $X_1,...,X_n$.

Sample data with replacement to produce $X_1^{(p)},...,X_n^{(p)}$

Now since both are length $n$, then how does this not produce always the same sample? I'm missing something.

asked Nov 21 '18 at 10:36

mavavilj

1,220724

add a comment |

6

6

6

Why does non-parametric bootstrap not return the same sample over and over again?

My notes write:

Assume data $X_1,...,X_n$.

Sample data with replacement to produce $X_1^{(p)},...,X_n^{(p)}$

Now since both are length $n$, then how does this not produce always the same sample? I'm missing something.

asked Nov 21 '18 at 10:36

mavavilj

1,220724

Why does non-parametric bootstrap not return the same sample over and over again?

My notes write:

Assume data $X_1,...,X_n$.

Sample data with replacement to produce $X_1^{(p)},...,X_n^{(p)}$

Now since both are length $n$, then how does this not produce always the same sample? I'm missing something.

bootstrap

asked Nov 21 '18 at 10:36

mavavilj

1,220724

asked Nov 21 '18 at 10:36

mavavilj

1,220724

asked Nov 21 '18 at 10:36

mavavilj

1,220724

asked Nov 21 '18 at 10:36

mavavilj

1,220724

asked Nov 21 '18 at 10:36

mavavilj

1,220724

1,220724

add a comment |

add a comment |

3 Answers
3

active

oldest

votes

13

Each member of the bootstrap sample is selected randomly with replacement from the data set. If we were to sample without replacement, then every sample would simply be a re-ordering of the same data. But, as a consequence of replacement, the bootstrap samples differ in how many times they include each data point (which may be once, multiple times, or not at all). On average, ~63% of data points appear at least once in a given bootstrap sample.

answered Nov 21 '18 at 12:05

user20160

15.9k12555

add a comment |

1

@user20160's explanation is fine. Here's an example of 10 bootstrap samples of the sequence from 1 to 5, showing that some values will be represented more than once and other values will not be represented (x <- 1:5; t(replicate(10,sort(sample(x,replace=TRUE)))))

      [,1] [,2] [,3] [,4] [,5]

 [1,]    2    2    4    4    5

 [2,]    1    1    1    2    4

 [3,]    3    3    3    5    5

 [4,]    1    1    1    2    3

 [5,]    1    1    2    3    3

 [6,]    1    2    3    4    4

 [7,]    2    2    3    4    5

 [8,]    3    3    3    4    4

 [9,]    1    1    2    3    5

[10,]    1    1    2    4    4

answered Nov 21 '18 at 17:42

Ben Bolker

22.7k16191

add a comment |

0

Just to confirm the answers here, the key misunderstanding is the questioner believes there is no replacement in the sampling. Thus if there are 10 elements and 10 random sampling events and 2 replications, each replication is identical to the other without replacement. The number of random sampling events can never exceed the original sample size.

However, with replacement the number of sampling events in theory could exceed the number of elements, thus the original sample size could increased to any given number. In practice however this would be erroneous because you would artificially lower the variance (which is a no no), the mean however would remain the same.

Just to clarify, increasing the number of replications is the correct approach to stabilise both the mean and variance. I'll refrain from elaborating.

Just to waffle, bootstrapping (nonparametric) is cool when you've no idea how to derrive the 95% confidence interval of the mean (sort the bootstrap and remove the upper and lower 2.5%). The technique has its critiques however.

answered Nov 22 '18 at 0:59

Michael G.

1011

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "65"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Email

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f378091%2fwhy-does-non-parametric-bootstrap-not-return-the-same-sample-over-and-over-again%23new-answer', 'question_page');
}
);

Post as a guest

Name

Email

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

active

oldest

votes

active

oldest

votes

13

Each member of the bootstrap sample is selected randomly with replacement from the data set. If we were to sample without replacement, then every sample would simply be a re-ordering of the same data. But, as a consequence of replacement, the bootstrap samples differ in how many times they include each data point (which may be once, multiple times, or not at all). On average, ~63% of data points appear at least once in a given bootstrap sample.

answered Nov 21 '18 at 12:05

user20160

15.9k12555

add a comment |

13

Each member of the bootstrap sample is selected randomly with replacement from the data set. If we were to sample without replacement, then every sample would simply be a re-ordering of the same data. But, as a consequence of replacement, the bootstrap samples differ in how many times they include each data point (which may be once, multiple times, or not at all). On average, ~63% of data points appear at least once in a given bootstrap sample.

answered Nov 21 '18 at 12:05

user20160

15.9k12555

add a comment |

13

13

13

Each member of the bootstrap sample is selected randomly with replacement from the data set. If we were to sample without replacement, then every sample would simply be a re-ordering of the same data. But, as a consequence of replacement, the bootstrap samples differ in how many times they include each data point (which may be once, multiple times, or not at all). On average, ~63% of data points appear at least once in a given bootstrap sample.

answered Nov 21 '18 at 12:05

user20160

15.9k12555

Each member of the bootstrap sample is selected randomly with replacement from the data set. If we were to sample without replacement, then every sample would simply be a re-ordering of the same data. But, as a consequence of replacement, the bootstrap samples differ in how many times they include each data point (which may be once, multiple times, or not at all). On average, ~63% of data points appear at least once in a given bootstrap sample.

answered Nov 21 '18 at 12:05

user20160

15.9k12555

answered Nov 21 '18 at 12:05

user20160

15.9k12555

answered Nov 21 '18 at 12:05

user20160

15.9k12555

answered Nov 21 '18 at 12:05

user20160

15.9k12555

15.9k12555

add a comment |

add a comment |

1

@user20160's explanation is fine. Here's an example of 10 bootstrap samples of the sequence from 1 to 5, showing that some values will be represented more than once and other values will not be represented (x <- 1:5; t(replicate(10,sort(sample(x,replace=TRUE)))))

      [,1] [,2] [,3] [,4] [,5]

 [1,]    2    2    4    4    5

 [2,]    1    1    1    2    4

 [3,]    3    3    3    5    5

 [4,]    1    1    1    2    3

 [5,]    1    1    2    3    3

 [6,]    1    2    3    4    4

 [7,]    2    2    3    4    5

 [8,]    3    3    3    4    4

 [9,]    1    1    2    3    5

[10,]    1    1    2    4    4

answered Nov 21 '18 at 17:42

Ben Bolker

22.7k16191

add a comment |

1

@user20160's explanation is fine. Here's an example of 10 bootstrap samples of the sequence from 1 to 5, showing that some values will be represented more than once and other values will not be represented (x <- 1:5; t(replicate(10,sort(sample(x,replace=TRUE)))))

      [,1] [,2] [,3] [,4] [,5]

 [1,]    2    2    4    4    5

 [2,]    1    1    1    2    4

 [3,]    3    3    3    5    5

 [4,]    1    1    1    2    3

 [5,]    1    1    2    3    3

 [6,]    1    2    3    4    4

 [7,]    2    2    3    4    5

 [8,]    3    3    3    4    4

 [9,]    1    1    2    3    5

[10,]    1    1    2    4    4

answered Nov 21 '18 at 17:42

Ben Bolker

22.7k16191

add a comment |

1

1

1

@user20160's explanation is fine. Here's an example of 10 bootstrap samples of the sequence from 1 to 5, showing that some values will be represented more than once and other values will not be represented (x <- 1:5; t(replicate(10,sort(sample(x,replace=TRUE)))))

      [,1] [,2] [,3] [,4] [,5]

 [1,]    2    2    4    4    5

 [2,]    1    1    1    2    4

 [3,]    3    3    3    5    5

 [4,]    1    1    1    2    3

 [5,]    1    1    2    3    3

 [6,]    1    2    3    4    4

 [7,]    2    2    3    4    5

 [8,]    3    3    3    4    4

 [9,]    1    1    2    3    5

[10,]    1    1    2    4    4

answered Nov 21 '18 at 17:42

Ben Bolker

22.7k16191

@user20160's explanation is fine. Here's an example of 10 bootstrap samples of the sequence from 1 to 5, showing that some values will be represented more than once and other values will not be represented (x <- 1:5; t(replicate(10,sort(sample(x,replace=TRUE)))))

      [,1] [,2] [,3] [,4] [,5]

 [1,]    2    2    4    4    5

 [2,]    1    1    1    2    4

 [3,]    3    3    3    5    5

 [4,]    1    1    1    2    3

 [5,]    1    1    2    3    3

 [6,]    1    2    3    4    4

 [7,]    2    2    3    4    5

 [8,]    3    3    3    4    4

 [9,]    1    1    2    3    5

[10,]    1    1    2    4    4

answered Nov 21 '18 at 17:42

Ben Bolker

22.7k16191

answered Nov 21 '18 at 17:42

Ben Bolker

22.7k16191

answered Nov 21 '18 at 17:42

Ben Bolker

22.7k16191

answered Nov 21 '18 at 17:42

Ben Bolker

22.7k16191

22.7k16191

add a comment |

add a comment |

0

Just to confirm the answers here, the key misunderstanding is the questioner believes there is no replacement in the sampling. Thus if there are 10 elements and 10 random sampling events and 2 replications, each replication is identical to the other without replacement. The number of random sampling events can never exceed the original sample size.

However, with replacement the number of sampling events in theory could exceed the number of elements, thus the original sample size could increased to any given number. In practice however this would be erroneous because you would artificially lower the variance (which is a no no), the mean however would remain the same.

Just to clarify, increasing the number of replications is the correct approach to stabilise both the mean and variance. I'll refrain from elaborating.

Just to waffle, bootstrapping (nonparametric) is cool when you've no idea how to derrive the 95% confidence interval of the mean (sort the bootstrap and remove the upper and lower 2.5%). The technique has its critiques however.

answered Nov 22 '18 at 0:59

Michael G.

1011

add a comment |

0

Just to confirm the answers here, the key misunderstanding is the questioner believes there is no replacement in the sampling. Thus if there are 10 elements and 10 random sampling events and 2 replications, each replication is identical to the other without replacement. The number of random sampling events can never exceed the original sample size.

However, with replacement the number of sampling events in theory could exceed the number of elements, thus the original sample size could increased to any given number. In practice however this would be erroneous because you would artificially lower the variance (which is a no no), the mean however would remain the same.

Just to clarify, increasing the number of replications is the correct approach to stabilise both the mean and variance. I'll refrain from elaborating.

Just to waffle, bootstrapping (nonparametric) is cool when you've no idea how to derrive the 95% confidence interval of the mean (sort the bootstrap and remove the upper and lower 2.5%). The technique has its critiques however.

answered Nov 22 '18 at 0:59

Michael G.

1011

add a comment |

0

0

0

Just to confirm the answers here, the key misunderstanding is the questioner believes there is no replacement in the sampling. Thus if there are 10 elements and 10 random sampling events and 2 replications, each replication is identical to the other without replacement. The number of random sampling events can never exceed the original sample size.

However, with replacement the number of sampling events in theory could exceed the number of elements, thus the original sample size could increased to any given number. In practice however this would be erroneous because you would artificially lower the variance (which is a no no), the mean however would remain the same.

Just to clarify, increasing the number of replications is the correct approach to stabilise both the mean and variance. I'll refrain from elaborating.

Just to waffle, bootstrapping (nonparametric) is cool when you've no idea how to derrive the 95% confidence interval of the mean (sort the bootstrap and remove the upper and lower 2.5%). The technique has its critiques however.

answered Nov 22 '18 at 0:59

Michael G.

1011

Just to confirm the answers here, the key misunderstanding is the questioner believes there is no replacement in the sampling. Thus if there are 10 elements and 10 random sampling events and 2 replications, each replication is identical to the other without replacement. The number of random sampling events can never exceed the original sample size.

However, with replacement the number of sampling events in theory could exceed the number of elements, thus the original sample size could increased to any given number. In practice however this would be erroneous because you would artificially lower the variance (which is a no no), the mean however would remain the same.

Just to clarify, increasing the number of replications is the correct approach to stabilise both the mean and variance. I'll refrain from elaborating.

Just to waffle, bootstrapping (nonparametric) is cool when you've no idea how to derrive the 95% confidence interval of the mean (sort the bootstrap and remove the upper and lower 2.5%). The technique has its critiques however.

answered Nov 22 '18 at 0:59

Michael G.

1011

answered Nov 22 '18 at 0:59

Michael G.

1011

answered Nov 22 '18 at 0:59

Michael G.

1011

answered Nov 22 '18 at 0:59

Michael G.

1011

1011

add a comment |

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Cross Validated!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Email

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f378091%2fwhy-does-non-parametric-bootstrap-not-return-the-same-sample-over-and-over-again%23new-answer', 'question_page');
}
);

Post as a guest

Name

Email

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Email

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Email

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Email

Required, but never shown

Name

Name

Email

Required, but never shown

Email

Required, but never shown

Email

Required, but never shown

Email

Required, but never shown

Name

Name

Email

Required, but never shown

Email

Required, but never shown

Email

Required, but never shown

Email

Required, but never shown

This page is only for reference, If you need detailed information, please check here