How does accuracy of a survey depend on sample size and population size?
$begingroup$
Which survey is more accurate? Assume the samples are taken perfectly randomly.
A sample of 100 people out of a population of 1000 (sample is 10% of population)
A sample of 1000 people out of a population of 1000000 (sample is 0.1% of population)
I remember my lecturer saying something like "when the sample size is small compared to the population, the accuracy depends almost all on the sample size, the population size is unimportant" . Is there a name for that result? It's quite surprising at first.
I'd love to see some graphs of these functions.
If it helps, here's a concrete example (made up by me).
An unknown proportion p of the population favour candidate Alice. The rest favour Bob. We take a random sample size k of the population (size n), and ask their preferences, to come up with an estimate p-hat.
How does the expected error $mathbb{E}|hat{p} - p|$ depend on k and n? And in the limit $ntoinfty$?
statistics sampling
$endgroup$
add a comment |
$begingroup$
Which survey is more accurate? Assume the samples are taken perfectly randomly.
A sample of 100 people out of a population of 1000 (sample is 10% of population)
A sample of 1000 people out of a population of 1000000 (sample is 0.1% of population)
I remember my lecturer saying something like "when the sample size is small compared to the population, the accuracy depends almost all on the sample size, the population size is unimportant" . Is there a name for that result? It's quite surprising at first.
I'd love to see some graphs of these functions.
If it helps, here's a concrete example (made up by me).
An unknown proportion p of the population favour candidate Alice. The rest favour Bob. We take a random sample size k of the population (size n), and ask their preferences, to come up with an estimate p-hat.
How does the expected error $mathbb{E}|hat{p} - p|$ depend on k and n? And in the limit $ntoinfty$?
statistics sampling
$endgroup$
1
$begingroup$
Look up 'finite population correction factor'.
$endgroup$
– copper.hat
Sep 11 '14 at 23:38
$begingroup$
Population size doesn't fix the sample size . The more important thing is "How muchvariability
is present in the population." For an example , if you go to buy a sack of rice , you will take the same sample size if you go to buy only 3-4 kg rice as you have the belief that the variability in the sack is same that of the variability in the 3-4 kg rice .
$endgroup$
– user 31466
Jul 11 '15 at 15:21
add a comment |
$begingroup$
Which survey is more accurate? Assume the samples are taken perfectly randomly.
A sample of 100 people out of a population of 1000 (sample is 10% of population)
A sample of 1000 people out of a population of 1000000 (sample is 0.1% of population)
I remember my lecturer saying something like "when the sample size is small compared to the population, the accuracy depends almost all on the sample size, the population size is unimportant" . Is there a name for that result? It's quite surprising at first.
I'd love to see some graphs of these functions.
If it helps, here's a concrete example (made up by me).
An unknown proportion p of the population favour candidate Alice. The rest favour Bob. We take a random sample size k of the population (size n), and ask their preferences, to come up with an estimate p-hat.
How does the expected error $mathbb{E}|hat{p} - p|$ depend on k and n? And in the limit $ntoinfty$?
statistics sampling
$endgroup$
Which survey is more accurate? Assume the samples are taken perfectly randomly.
A sample of 100 people out of a population of 1000 (sample is 10% of population)
A sample of 1000 people out of a population of 1000000 (sample is 0.1% of population)
I remember my lecturer saying something like "when the sample size is small compared to the population, the accuracy depends almost all on the sample size, the population size is unimportant" . Is there a name for that result? It's quite surprising at first.
I'd love to see some graphs of these functions.
If it helps, here's a concrete example (made up by me).
An unknown proportion p of the population favour candidate Alice. The rest favour Bob. We take a random sample size k of the population (size n), and ask their preferences, to come up with an estimate p-hat.
How does the expected error $mathbb{E}|hat{p} - p|$ depend on k and n? And in the limit $ntoinfty$?
statistics sampling
statistics sampling
edited Jan 10 at 14:45
Colonel Panic
asked Sep 10 '14 at 16:17
Colonel PanicColonel Panic
5761625
5761625
1
$begingroup$
Look up 'finite population correction factor'.
$endgroup$
– copper.hat
Sep 11 '14 at 23:38
$begingroup$
Population size doesn't fix the sample size . The more important thing is "How muchvariability
is present in the population." For an example , if you go to buy a sack of rice , you will take the same sample size if you go to buy only 3-4 kg rice as you have the belief that the variability in the sack is same that of the variability in the 3-4 kg rice .
$endgroup$
– user 31466
Jul 11 '15 at 15:21
add a comment |
1
$begingroup$
Look up 'finite population correction factor'.
$endgroup$
– copper.hat
Sep 11 '14 at 23:38
$begingroup$
Population size doesn't fix the sample size . The more important thing is "How muchvariability
is present in the population." For an example , if you go to buy a sack of rice , you will take the same sample size if you go to buy only 3-4 kg rice as you have the belief that the variability in the sack is same that of the variability in the 3-4 kg rice .
$endgroup$
– user 31466
Jul 11 '15 at 15:21
1
1
$begingroup$
Look up 'finite population correction factor'.
$endgroup$
– copper.hat
Sep 11 '14 at 23:38
$begingroup$
Look up 'finite population correction factor'.
$endgroup$
– copper.hat
Sep 11 '14 at 23:38
$begingroup$
Population size doesn't fix the sample size . The more important thing is "How much
variability
is present in the population." For an example , if you go to buy a sack of rice , you will take the same sample size if you go to buy only 3-4 kg rice as you have the belief that the variability in the sack is same that of the variability in the 3-4 kg rice .$endgroup$
– user 31466
Jul 11 '15 at 15:21
$begingroup$
Population size doesn't fix the sample size . The more important thing is "How much
variability
is present in the population." For an example , if you go to buy a sack of rice , you will take the same sample size if you go to buy only 3-4 kg rice as you have the belief that the variability in the sack is same that of the variability in the 3-4 kg rice .$endgroup$
– user 31466
Jul 11 '15 at 15:21
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
Since you asked for graphs, I will make your example even a little more
specific. Suppose that in the sample we take, $60%$ of the voters prefer Alice.
For a sample size of $100,$ that means we had $60$ responses for Alice.
So it looks like Alice is ahead of Bob, doesn't it?
But how sure are we?
To assess the accuracy of our poll,
we can try setting up a one-sided $95%$ confidence interval
by finding the smallest percentage of the population
that might prefer Alice such that there is at least a $5%$ chance that
we would have gotten at least $60$ responses favoring Alice out
of $100$ total responses.
If the population size were $100,$ we would have polled the entire
population, and the confidence interval would start at $60%$
(zero error).
If the size of the population were just $105$,
of whom only $61$ ($58.1%$) favored Alice, there would still be
better than a $5%$ chance that we could get $60$ or more responses for Alice;
but if fewer people favored Alice then there would be
less than a $5%$ chance of $60$ favorable responses.
So we set the $95%$ confidence interval at $58.1%$ or greater,
which is a maximum error of $1.9%$ in the direction of overcounting
(that is, our estimate could be that much higher than the true percentage).
As the population size increases, the maximum error (in one direction)
increases rapidly at first but soon starts to level off.
The chart below shows errors for population sizes up to $3000$.
For a population of $1000$, our sample of $100$ (with $60$ favorable responses)
has a (one-sided) error of $8.2%$ at the $95%$ confidence level.
But the error grows very slowly after that.
In fact it will never be larger than about $8.7%$,
even for populations of a million or more.
There is some effect of increasing the population size above
$1000$ in this example, but it is not very dramatic.
On the other hand, suppose we have a population of one million voters,
take a sample of $k$ voters from this population, and find that
$60%$ of them prefer Alice.
If $k=100$ then we have a one-sided error of
about $8.7%$ at the $95%$ confidence level,
but if we make $k$ larger, the error decreases as shown in the chart below.
For $k=1000$ the error is only about $2.6%$.
That is, while for a sample of $100$ it made little difference
whether the population was $1000$ or one million,
it makes quite a lot of difference whether the sample size is
$100$ or $1000$.
We could do a similar analysis with two-sided confidence intervals,
where we look at possible errors in either direction.
The results would tell much the same story:
little degradation in accuracy for the $100$-voter sample as the
population grows above $1000,$ but great improvement in accuracy
when we increase the sample size.
That's what your lecturer meant by saying the accuracy depends
almost entirely on sample size
when the sample size is small relative to the population.
The only time the population size made much difference in our example
(changing the bounds of the confidence interval by
more than one percentage point)
was when the sample was considerably more than $10%$ of the entire population.
$endgroup$
add a comment |
$begingroup$
Note: For convenience only I use in the following $N$ for the size of the population and $n$ for the sample size.
In order to answer OPs questions we start with some preliminary work and describe the current situation in somewhat more detail.
Current situation:
Here we have a simple random sampling, meaning that every possible combination of $n$ units from a population of size $N$ is equally likely to be the sample selected.
We are in a sampling situation where the object is to estimate the proportion of units in a population having some attributes. In such a situation, the variable of interest is an indicator variable: $y_i=1$ if unit $i$ has the attribute, and $y_i=0$ if it does not.
Writing $p$ for the proportion in the population of size $N$ with the attribute
begin{align*}
p=frac{1}{N}sum_{i=1}^{N}y_i=mu
end{align*}
the finite population variance is
begin{align*}
sigma^2&=frac{sum_{i=1}^{N-1}(y_i-p)^2}{N-1}=frac{sum_{i=1}^{N-1}y_i^2-Np^2}{N-1}
=frac{Np-Np^2}{N-1}\
&=frac{N}{N-1}p(1-p)
end{align*}
Now letting $hat{p}$ denote the proportion in the sample of size $n$ with the attribute
begin{align*}
hat{p}=frac{1}{n}sum_{i=1}^n{y_i}=bar{y}
end{align*}
the sample variance is
begin{align*}
s^2&=frac{sum_{i=1}^{n-1}(y_i-bar{y})^2}{n-1}=frac{sum_{i=1}^{n-1}y_i^2-nhat{p}^2}{n-1}\
&=frac{n}{n-1}hat{p}(1-hat{p})\
end{align*}
Note the sample proportion is the sample mean of a simple random sample, it is unbiased for the population proportion and has variance
begin{align*}
mathop{var}(hat{p})=frac{N-n}{N-1}cdotfrac{p(1-p)}{n}tag{1}
end{align*}
Before we can answer OPs questions we have to do some general
Considerations regarding accuracy:
Suppose that one wishes to estimate a population parameter $theta$ - for example the population mean or total or proportion of an attribute of the units of the population with an estimator $hat{theta}$. Then we would wish the estimate to be close to the true value with high probability.
So, specifying a maximum allowable difference $d$ between the estimate and the true value and allowing for a small probability $alpha$ that the error may exceed that difference, the challenge is to choose a sample size $n$ such that
begin{align*}
P(|hat{theta}-theta|>d)<alphatag{2}
end{align*}
If the estimator $hat{theta}$ is an unbiased, normally distributed estimator of $theta$, then $frac{hat{theta}-theta}{sqrt{mathop{var}(hat{theta})}}$ has a standard normal distribution. Letting $z$ denote the upper $frac{alpha}{2}$ point of the standard normal distribution yields
begin{align*}
Pleft(frac{|hat{theta}-theta|}{sqrt{mathop{var}(hat{theta})}}>zright)
=Pleft(|hat{theta}-theta|>zsqrt{mathop{var}(hat{theta})}right)=alpha
end{align*}
Now, since $d$ and the expression (2) provide us with a precise idea of accuracy, we are ready to harvest.
Observe, that the variance of the estimator $hat{theta}$ decreases with an increasing sample size $n$, so that the inequality above will be satisfied if we can choose $n$ large enough to make
begin{align*}
zsqrt{mathop{var}(hat{theta})}leq dtag{3}
end{align*}
These are the relevant parameters to deal with accuracy. Next we consider
Sample size $n$ for estimating a proportion:
To obtain an estimator $hat{p}$ having probability at least $1-alpha$ of being no farther then $d$ from the population proportion, the sample size formula based on the normal approximation gives according to (1) and (3)
begin{align*}
mathop{var}(hat{theta})&=frac{d^2}{z^2}\
frac{N-n}{N-1}cdotfrac{p(1-p)}{n}&=frac{d^2}{z^2}\
end{align*}
We obtain by setting $n_0=frac{z^2}{d^2}p(1-p)$
begin{align*}
n=frac{1}{frac{N-1}{N}cdotfrac{1}{n_0}+frac{1}{N}}tag{3}
end{align*}
Note that the formula depends on the unknown population proportion $p$. Since no estimate of $p$ is available, a worst-case value of $p=frac{1}{2}$ can be used in determining the sample size. This approach is justified since the quantity $p(1-p)$, and hence the value of $n$ assumes its maximum value when $p=frac{1}{2}$.
Note: When $N$ is large compared with the sample size $n$ then formula (3) reduces to
begin{align*}
n&simeq limlimits_{Nrightarrow infty}frac{1}{frac{N-1}{N}cdotfrac{1}{n_0}+frac{1}{N}}=n_0
end{align*}
Since then $n=n_0$ we obtain
begin{align*}
n=frac{z^2}{d^2}p(1-p)tag{4}
end{align*}
and we see in accordance with OPs lecturer, that in case the sample size $n$ is small compared with the population size the accuracy $d$ depends on the sample only.
With regard to one of OPs questions I'm not aware of a specific term for this circumstance. But, sometimes this is named finite population correction.
Which scenario is more accurate:
To answer this question we now transform (3) to obtain the difference $d$
begin{align*}
d=zsqrt{frac{N-n}{(N-1)n}p(1-p)}
end{align*}
Assuming an estimation for the true proportion with probability $0.95$ ($alpha=0.05$) and taking the worst-case probability $p=0.5$ we obtain the formula
begin{align*}
d=1.96sqrt{frac{N-n}{(N-1)n}cdotfrac{1}{2}cdotfrac{1}{2}}=0.98sqrt{frac{N-n}{(N-1)n}}
end{align*}
We observe in case 1: $N=1000, n=100$
begin{align*}
d=0.98sqrt{frac{900}{999cdot100}}simeq 0.0930
end{align*}
and in case 2: $N=100000, n=1000$
begin{align*}
d=0.98sqrt{frac{999000}{999999cdot1090}}simeq 0.0310
end{align*}
and conclude, that the accuracy of case 2 is greater than that of case 1 provided the interpretation is according to the modeling above.
Note: This answer is mostly based upon Sampling, chapter 5: Estimating Proportions, Ratios and Subpopulation Means by Steven K. Thompson.
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f926478%2fhow-does-accuracy-of-a-survey-depend-on-sample-size-and-population-size%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Since you asked for graphs, I will make your example even a little more
specific. Suppose that in the sample we take, $60%$ of the voters prefer Alice.
For a sample size of $100,$ that means we had $60$ responses for Alice.
So it looks like Alice is ahead of Bob, doesn't it?
But how sure are we?
To assess the accuracy of our poll,
we can try setting up a one-sided $95%$ confidence interval
by finding the smallest percentage of the population
that might prefer Alice such that there is at least a $5%$ chance that
we would have gotten at least $60$ responses favoring Alice out
of $100$ total responses.
If the population size were $100,$ we would have polled the entire
population, and the confidence interval would start at $60%$
(zero error).
If the size of the population were just $105$,
of whom only $61$ ($58.1%$) favored Alice, there would still be
better than a $5%$ chance that we could get $60$ or more responses for Alice;
but if fewer people favored Alice then there would be
less than a $5%$ chance of $60$ favorable responses.
So we set the $95%$ confidence interval at $58.1%$ or greater,
which is a maximum error of $1.9%$ in the direction of overcounting
(that is, our estimate could be that much higher than the true percentage).
As the population size increases, the maximum error (in one direction)
increases rapidly at first but soon starts to level off.
The chart below shows errors for population sizes up to $3000$.
For a population of $1000$, our sample of $100$ (with $60$ favorable responses)
has a (one-sided) error of $8.2%$ at the $95%$ confidence level.
But the error grows very slowly after that.
In fact it will never be larger than about $8.7%$,
even for populations of a million or more.
There is some effect of increasing the population size above
$1000$ in this example, but it is not very dramatic.
On the other hand, suppose we have a population of one million voters,
take a sample of $k$ voters from this population, and find that
$60%$ of them prefer Alice.
If $k=100$ then we have a one-sided error of
about $8.7%$ at the $95%$ confidence level,
but if we make $k$ larger, the error decreases as shown in the chart below.
For $k=1000$ the error is only about $2.6%$.
That is, while for a sample of $100$ it made little difference
whether the population was $1000$ or one million,
it makes quite a lot of difference whether the sample size is
$100$ or $1000$.
We could do a similar analysis with two-sided confidence intervals,
where we look at possible errors in either direction.
The results would tell much the same story:
little degradation in accuracy for the $100$-voter sample as the
population grows above $1000,$ but great improvement in accuracy
when we increase the sample size.
That's what your lecturer meant by saying the accuracy depends
almost entirely on sample size
when the sample size is small relative to the population.
The only time the population size made much difference in our example
(changing the bounds of the confidence interval by
more than one percentage point)
was when the sample was considerably more than $10%$ of the entire population.
$endgroup$
add a comment |
$begingroup$
Since you asked for graphs, I will make your example even a little more
specific. Suppose that in the sample we take, $60%$ of the voters prefer Alice.
For a sample size of $100,$ that means we had $60$ responses for Alice.
So it looks like Alice is ahead of Bob, doesn't it?
But how sure are we?
To assess the accuracy of our poll,
we can try setting up a one-sided $95%$ confidence interval
by finding the smallest percentage of the population
that might prefer Alice such that there is at least a $5%$ chance that
we would have gotten at least $60$ responses favoring Alice out
of $100$ total responses.
If the population size were $100,$ we would have polled the entire
population, and the confidence interval would start at $60%$
(zero error).
If the size of the population were just $105$,
of whom only $61$ ($58.1%$) favored Alice, there would still be
better than a $5%$ chance that we could get $60$ or more responses for Alice;
but if fewer people favored Alice then there would be
less than a $5%$ chance of $60$ favorable responses.
So we set the $95%$ confidence interval at $58.1%$ or greater,
which is a maximum error of $1.9%$ in the direction of overcounting
(that is, our estimate could be that much higher than the true percentage).
As the population size increases, the maximum error (in one direction)
increases rapidly at first but soon starts to level off.
The chart below shows errors for population sizes up to $3000$.
For a population of $1000$, our sample of $100$ (with $60$ favorable responses)
has a (one-sided) error of $8.2%$ at the $95%$ confidence level.
But the error grows very slowly after that.
In fact it will never be larger than about $8.7%$,
even for populations of a million or more.
There is some effect of increasing the population size above
$1000$ in this example, but it is not very dramatic.
On the other hand, suppose we have a population of one million voters,
take a sample of $k$ voters from this population, and find that
$60%$ of them prefer Alice.
If $k=100$ then we have a one-sided error of
about $8.7%$ at the $95%$ confidence level,
but if we make $k$ larger, the error decreases as shown in the chart below.
For $k=1000$ the error is only about $2.6%$.
That is, while for a sample of $100$ it made little difference
whether the population was $1000$ or one million,
it makes quite a lot of difference whether the sample size is
$100$ or $1000$.
We could do a similar analysis with two-sided confidence intervals,
where we look at possible errors in either direction.
The results would tell much the same story:
little degradation in accuracy for the $100$-voter sample as the
population grows above $1000,$ but great improvement in accuracy
when we increase the sample size.
That's what your lecturer meant by saying the accuracy depends
almost entirely on sample size
when the sample size is small relative to the population.
The only time the population size made much difference in our example
(changing the bounds of the confidence interval by
more than one percentage point)
was when the sample was considerably more than $10%$ of the entire population.
$endgroup$
add a comment |
$begingroup$
Since you asked for graphs, I will make your example even a little more
specific. Suppose that in the sample we take, $60%$ of the voters prefer Alice.
For a sample size of $100,$ that means we had $60$ responses for Alice.
So it looks like Alice is ahead of Bob, doesn't it?
But how sure are we?
To assess the accuracy of our poll,
we can try setting up a one-sided $95%$ confidence interval
by finding the smallest percentage of the population
that might prefer Alice such that there is at least a $5%$ chance that
we would have gotten at least $60$ responses favoring Alice out
of $100$ total responses.
If the population size were $100,$ we would have polled the entire
population, and the confidence interval would start at $60%$
(zero error).
If the size of the population were just $105$,
of whom only $61$ ($58.1%$) favored Alice, there would still be
better than a $5%$ chance that we could get $60$ or more responses for Alice;
but if fewer people favored Alice then there would be
less than a $5%$ chance of $60$ favorable responses.
So we set the $95%$ confidence interval at $58.1%$ or greater,
which is a maximum error of $1.9%$ in the direction of overcounting
(that is, our estimate could be that much higher than the true percentage).
As the population size increases, the maximum error (in one direction)
increases rapidly at first but soon starts to level off.
The chart below shows errors for population sizes up to $3000$.
For a population of $1000$, our sample of $100$ (with $60$ favorable responses)
has a (one-sided) error of $8.2%$ at the $95%$ confidence level.
But the error grows very slowly after that.
In fact it will never be larger than about $8.7%$,
even for populations of a million or more.
There is some effect of increasing the population size above
$1000$ in this example, but it is not very dramatic.
On the other hand, suppose we have a population of one million voters,
take a sample of $k$ voters from this population, and find that
$60%$ of them prefer Alice.
If $k=100$ then we have a one-sided error of
about $8.7%$ at the $95%$ confidence level,
but if we make $k$ larger, the error decreases as shown in the chart below.
For $k=1000$ the error is only about $2.6%$.
That is, while for a sample of $100$ it made little difference
whether the population was $1000$ or one million,
it makes quite a lot of difference whether the sample size is
$100$ or $1000$.
We could do a similar analysis with two-sided confidence intervals,
where we look at possible errors in either direction.
The results would tell much the same story:
little degradation in accuracy for the $100$-voter sample as the
population grows above $1000,$ but great improvement in accuracy
when we increase the sample size.
That's what your lecturer meant by saying the accuracy depends
almost entirely on sample size
when the sample size is small relative to the population.
The only time the population size made much difference in our example
(changing the bounds of the confidence interval by
more than one percentage point)
was when the sample was considerably more than $10%$ of the entire population.
$endgroup$
Since you asked for graphs, I will make your example even a little more
specific. Suppose that in the sample we take, $60%$ of the voters prefer Alice.
For a sample size of $100,$ that means we had $60$ responses for Alice.
So it looks like Alice is ahead of Bob, doesn't it?
But how sure are we?
To assess the accuracy of our poll,
we can try setting up a one-sided $95%$ confidence interval
by finding the smallest percentage of the population
that might prefer Alice such that there is at least a $5%$ chance that
we would have gotten at least $60$ responses favoring Alice out
of $100$ total responses.
If the population size were $100,$ we would have polled the entire
population, and the confidence interval would start at $60%$
(zero error).
If the size of the population were just $105$,
of whom only $61$ ($58.1%$) favored Alice, there would still be
better than a $5%$ chance that we could get $60$ or more responses for Alice;
but if fewer people favored Alice then there would be
less than a $5%$ chance of $60$ favorable responses.
So we set the $95%$ confidence interval at $58.1%$ or greater,
which is a maximum error of $1.9%$ in the direction of overcounting
(that is, our estimate could be that much higher than the true percentage).
As the population size increases, the maximum error (in one direction)
increases rapidly at first but soon starts to level off.
The chart below shows errors for population sizes up to $3000$.
For a population of $1000$, our sample of $100$ (with $60$ favorable responses)
has a (one-sided) error of $8.2%$ at the $95%$ confidence level.
But the error grows very slowly after that.
In fact it will never be larger than about $8.7%$,
even for populations of a million or more.
There is some effect of increasing the population size above
$1000$ in this example, but it is not very dramatic.
On the other hand, suppose we have a population of one million voters,
take a sample of $k$ voters from this population, and find that
$60%$ of them prefer Alice.
If $k=100$ then we have a one-sided error of
about $8.7%$ at the $95%$ confidence level,
but if we make $k$ larger, the error decreases as shown in the chart below.
For $k=1000$ the error is only about $2.6%$.
That is, while for a sample of $100$ it made little difference
whether the population was $1000$ or one million,
it makes quite a lot of difference whether the sample size is
$100$ or $1000$.
We could do a similar analysis with two-sided confidence intervals,
where we look at possible errors in either direction.
The results would tell much the same story:
little degradation in accuracy for the $100$-voter sample as the
population grows above $1000,$ but great improvement in accuracy
when we increase the sample size.
That's what your lecturer meant by saying the accuracy depends
almost entirely on sample size
when the sample size is small relative to the population.
The only time the population size made much difference in our example
(changing the bounds of the confidence interval by
more than one percentage point)
was when the sample was considerably more than $10%$ of the entire population.
edited Jul 16 '15 at 1:22
answered Jul 12 '15 at 3:16
David KDavid K
53.9k342116
53.9k342116
add a comment |
add a comment |
$begingroup$
Note: For convenience only I use in the following $N$ for the size of the population and $n$ for the sample size.
In order to answer OPs questions we start with some preliminary work and describe the current situation in somewhat more detail.
Current situation:
Here we have a simple random sampling, meaning that every possible combination of $n$ units from a population of size $N$ is equally likely to be the sample selected.
We are in a sampling situation where the object is to estimate the proportion of units in a population having some attributes. In such a situation, the variable of interest is an indicator variable: $y_i=1$ if unit $i$ has the attribute, and $y_i=0$ if it does not.
Writing $p$ for the proportion in the population of size $N$ with the attribute
begin{align*}
p=frac{1}{N}sum_{i=1}^{N}y_i=mu
end{align*}
the finite population variance is
begin{align*}
sigma^2&=frac{sum_{i=1}^{N-1}(y_i-p)^2}{N-1}=frac{sum_{i=1}^{N-1}y_i^2-Np^2}{N-1}
=frac{Np-Np^2}{N-1}\
&=frac{N}{N-1}p(1-p)
end{align*}
Now letting $hat{p}$ denote the proportion in the sample of size $n$ with the attribute
begin{align*}
hat{p}=frac{1}{n}sum_{i=1}^n{y_i}=bar{y}
end{align*}
the sample variance is
begin{align*}
s^2&=frac{sum_{i=1}^{n-1}(y_i-bar{y})^2}{n-1}=frac{sum_{i=1}^{n-1}y_i^2-nhat{p}^2}{n-1}\
&=frac{n}{n-1}hat{p}(1-hat{p})\
end{align*}
Note the sample proportion is the sample mean of a simple random sample, it is unbiased for the population proportion and has variance
begin{align*}
mathop{var}(hat{p})=frac{N-n}{N-1}cdotfrac{p(1-p)}{n}tag{1}
end{align*}
Before we can answer OPs questions we have to do some general
Considerations regarding accuracy:
Suppose that one wishes to estimate a population parameter $theta$ - for example the population mean or total or proportion of an attribute of the units of the population with an estimator $hat{theta}$. Then we would wish the estimate to be close to the true value with high probability.
So, specifying a maximum allowable difference $d$ between the estimate and the true value and allowing for a small probability $alpha$ that the error may exceed that difference, the challenge is to choose a sample size $n$ such that
begin{align*}
P(|hat{theta}-theta|>d)<alphatag{2}
end{align*}
If the estimator $hat{theta}$ is an unbiased, normally distributed estimator of $theta$, then $frac{hat{theta}-theta}{sqrt{mathop{var}(hat{theta})}}$ has a standard normal distribution. Letting $z$ denote the upper $frac{alpha}{2}$ point of the standard normal distribution yields
begin{align*}
Pleft(frac{|hat{theta}-theta|}{sqrt{mathop{var}(hat{theta})}}>zright)
=Pleft(|hat{theta}-theta|>zsqrt{mathop{var}(hat{theta})}right)=alpha
end{align*}
Now, since $d$ and the expression (2) provide us with a precise idea of accuracy, we are ready to harvest.
Observe, that the variance of the estimator $hat{theta}$ decreases with an increasing sample size $n$, so that the inequality above will be satisfied if we can choose $n$ large enough to make
begin{align*}
zsqrt{mathop{var}(hat{theta})}leq dtag{3}
end{align*}
These are the relevant parameters to deal with accuracy. Next we consider
Sample size $n$ for estimating a proportion:
To obtain an estimator $hat{p}$ having probability at least $1-alpha$ of being no farther then $d$ from the population proportion, the sample size formula based on the normal approximation gives according to (1) and (3)
begin{align*}
mathop{var}(hat{theta})&=frac{d^2}{z^2}\
frac{N-n}{N-1}cdotfrac{p(1-p)}{n}&=frac{d^2}{z^2}\
end{align*}
We obtain by setting $n_0=frac{z^2}{d^2}p(1-p)$
begin{align*}
n=frac{1}{frac{N-1}{N}cdotfrac{1}{n_0}+frac{1}{N}}tag{3}
end{align*}
Note that the formula depends on the unknown population proportion $p$. Since no estimate of $p$ is available, a worst-case value of $p=frac{1}{2}$ can be used in determining the sample size. This approach is justified since the quantity $p(1-p)$, and hence the value of $n$ assumes its maximum value when $p=frac{1}{2}$.
Note: When $N$ is large compared with the sample size $n$ then formula (3) reduces to
begin{align*}
n&simeq limlimits_{Nrightarrow infty}frac{1}{frac{N-1}{N}cdotfrac{1}{n_0}+frac{1}{N}}=n_0
end{align*}
Since then $n=n_0$ we obtain
begin{align*}
n=frac{z^2}{d^2}p(1-p)tag{4}
end{align*}
and we see in accordance with OPs lecturer, that in case the sample size $n$ is small compared with the population size the accuracy $d$ depends on the sample only.
With regard to one of OPs questions I'm not aware of a specific term for this circumstance. But, sometimes this is named finite population correction.
Which scenario is more accurate:
To answer this question we now transform (3) to obtain the difference $d$
begin{align*}
d=zsqrt{frac{N-n}{(N-1)n}p(1-p)}
end{align*}
Assuming an estimation for the true proportion with probability $0.95$ ($alpha=0.05$) and taking the worst-case probability $p=0.5$ we obtain the formula
begin{align*}
d=1.96sqrt{frac{N-n}{(N-1)n}cdotfrac{1}{2}cdotfrac{1}{2}}=0.98sqrt{frac{N-n}{(N-1)n}}
end{align*}
We observe in case 1: $N=1000, n=100$
begin{align*}
d=0.98sqrt{frac{900}{999cdot100}}simeq 0.0930
end{align*}
and in case 2: $N=100000, n=1000$
begin{align*}
d=0.98sqrt{frac{999000}{999999cdot1090}}simeq 0.0310
end{align*}
and conclude, that the accuracy of case 2 is greater than that of case 1 provided the interpretation is according to the modeling above.
Note: This answer is mostly based upon Sampling, chapter 5: Estimating Proportions, Ratios and Subpopulation Means by Steven K. Thompson.
$endgroup$
add a comment |
$begingroup$
Note: For convenience only I use in the following $N$ for the size of the population and $n$ for the sample size.
In order to answer OPs questions we start with some preliminary work and describe the current situation in somewhat more detail.
Current situation:
Here we have a simple random sampling, meaning that every possible combination of $n$ units from a population of size $N$ is equally likely to be the sample selected.
We are in a sampling situation where the object is to estimate the proportion of units in a population having some attributes. In such a situation, the variable of interest is an indicator variable: $y_i=1$ if unit $i$ has the attribute, and $y_i=0$ if it does not.
Writing $p$ for the proportion in the population of size $N$ with the attribute
begin{align*}
p=frac{1}{N}sum_{i=1}^{N}y_i=mu
end{align*}
the finite population variance is
begin{align*}
sigma^2&=frac{sum_{i=1}^{N-1}(y_i-p)^2}{N-1}=frac{sum_{i=1}^{N-1}y_i^2-Np^2}{N-1}
=frac{Np-Np^2}{N-1}\
&=frac{N}{N-1}p(1-p)
end{align*}
Now letting $hat{p}$ denote the proportion in the sample of size $n$ with the attribute
begin{align*}
hat{p}=frac{1}{n}sum_{i=1}^n{y_i}=bar{y}
end{align*}
the sample variance is
begin{align*}
s^2&=frac{sum_{i=1}^{n-1}(y_i-bar{y})^2}{n-1}=frac{sum_{i=1}^{n-1}y_i^2-nhat{p}^2}{n-1}\
&=frac{n}{n-1}hat{p}(1-hat{p})\
end{align*}
Note the sample proportion is the sample mean of a simple random sample, it is unbiased for the population proportion and has variance
begin{align*}
mathop{var}(hat{p})=frac{N-n}{N-1}cdotfrac{p(1-p)}{n}tag{1}
end{align*}
Before we can answer OPs questions we have to do some general
Considerations regarding accuracy:
Suppose that one wishes to estimate a population parameter $theta$ - for example the population mean or total or proportion of an attribute of the units of the population with an estimator $hat{theta}$. Then we would wish the estimate to be close to the true value with high probability.
So, specifying a maximum allowable difference $d$ between the estimate and the true value and allowing for a small probability $alpha$ that the error may exceed that difference, the challenge is to choose a sample size $n$ such that
begin{align*}
P(|hat{theta}-theta|>d)<alphatag{2}
end{align*}
If the estimator $hat{theta}$ is an unbiased, normally distributed estimator of $theta$, then $frac{hat{theta}-theta}{sqrt{mathop{var}(hat{theta})}}$ has a standard normal distribution. Letting $z$ denote the upper $frac{alpha}{2}$ point of the standard normal distribution yields
begin{align*}
Pleft(frac{|hat{theta}-theta|}{sqrt{mathop{var}(hat{theta})}}>zright)
=Pleft(|hat{theta}-theta|>zsqrt{mathop{var}(hat{theta})}right)=alpha
end{align*}
Now, since $d$ and the expression (2) provide us with a precise idea of accuracy, we are ready to harvest.
Observe, that the variance of the estimator $hat{theta}$ decreases with an increasing sample size $n$, so that the inequality above will be satisfied if we can choose $n$ large enough to make
begin{align*}
zsqrt{mathop{var}(hat{theta})}leq dtag{3}
end{align*}
These are the relevant parameters to deal with accuracy. Next we consider
Sample size $n$ for estimating a proportion:
To obtain an estimator $hat{p}$ having probability at least $1-alpha$ of being no farther then $d$ from the population proportion, the sample size formula based on the normal approximation gives according to (1) and (3)
begin{align*}
mathop{var}(hat{theta})&=frac{d^2}{z^2}\
frac{N-n}{N-1}cdotfrac{p(1-p)}{n}&=frac{d^2}{z^2}\
end{align*}
We obtain by setting $n_0=frac{z^2}{d^2}p(1-p)$
begin{align*}
n=frac{1}{frac{N-1}{N}cdotfrac{1}{n_0}+frac{1}{N}}tag{3}
end{align*}
Note that the formula depends on the unknown population proportion $p$. Since no estimate of $p$ is available, a worst-case value of $p=frac{1}{2}$ can be used in determining the sample size. This approach is justified since the quantity $p(1-p)$, and hence the value of $n$ assumes its maximum value when $p=frac{1}{2}$.
Note: When $N$ is large compared with the sample size $n$ then formula (3) reduces to
begin{align*}
n&simeq limlimits_{Nrightarrow infty}frac{1}{frac{N-1}{N}cdotfrac{1}{n_0}+frac{1}{N}}=n_0
end{align*}
Since then $n=n_0$ we obtain
begin{align*}
n=frac{z^2}{d^2}p(1-p)tag{4}
end{align*}
and we see in accordance with OPs lecturer, that in case the sample size $n$ is small compared with the population size the accuracy $d$ depends on the sample only.
With regard to one of OPs questions I'm not aware of a specific term for this circumstance. But, sometimes this is named finite population correction.
Which scenario is more accurate:
To answer this question we now transform (3) to obtain the difference $d$
begin{align*}
d=zsqrt{frac{N-n}{(N-1)n}p(1-p)}
end{align*}
Assuming an estimation for the true proportion with probability $0.95$ ($alpha=0.05$) and taking the worst-case probability $p=0.5$ we obtain the formula
begin{align*}
d=1.96sqrt{frac{N-n}{(N-1)n}cdotfrac{1}{2}cdotfrac{1}{2}}=0.98sqrt{frac{N-n}{(N-1)n}}
end{align*}
We observe in case 1: $N=1000, n=100$
begin{align*}
d=0.98sqrt{frac{900}{999cdot100}}simeq 0.0930
end{align*}
and in case 2: $N=100000, n=1000$
begin{align*}
d=0.98sqrt{frac{999000}{999999cdot1090}}simeq 0.0310
end{align*}
and conclude, that the accuracy of case 2 is greater than that of case 1 provided the interpretation is according to the modeling above.
Note: This answer is mostly based upon Sampling, chapter 5: Estimating Proportions, Ratios and Subpopulation Means by Steven K. Thompson.
$endgroup$
add a comment |
$begingroup$
Note: For convenience only I use in the following $N$ for the size of the population and $n$ for the sample size.
In order to answer OPs questions we start with some preliminary work and describe the current situation in somewhat more detail.
Current situation:
Here we have a simple random sampling, meaning that every possible combination of $n$ units from a population of size $N$ is equally likely to be the sample selected.
We are in a sampling situation where the object is to estimate the proportion of units in a population having some attributes. In such a situation, the variable of interest is an indicator variable: $y_i=1$ if unit $i$ has the attribute, and $y_i=0$ if it does not.
Writing $p$ for the proportion in the population of size $N$ with the attribute
begin{align*}
p=frac{1}{N}sum_{i=1}^{N}y_i=mu
end{align*}
the finite population variance is
begin{align*}
sigma^2&=frac{sum_{i=1}^{N-1}(y_i-p)^2}{N-1}=frac{sum_{i=1}^{N-1}y_i^2-Np^2}{N-1}
=frac{Np-Np^2}{N-1}\
&=frac{N}{N-1}p(1-p)
end{align*}
Now letting $hat{p}$ denote the proportion in the sample of size $n$ with the attribute
begin{align*}
hat{p}=frac{1}{n}sum_{i=1}^n{y_i}=bar{y}
end{align*}
the sample variance is
begin{align*}
s^2&=frac{sum_{i=1}^{n-1}(y_i-bar{y})^2}{n-1}=frac{sum_{i=1}^{n-1}y_i^2-nhat{p}^2}{n-1}\
&=frac{n}{n-1}hat{p}(1-hat{p})\
end{align*}
Note the sample proportion is the sample mean of a simple random sample, it is unbiased for the population proportion and has variance
begin{align*}
mathop{var}(hat{p})=frac{N-n}{N-1}cdotfrac{p(1-p)}{n}tag{1}
end{align*}
Before we can answer OPs questions we have to do some general
Considerations regarding accuracy:
Suppose that one wishes to estimate a population parameter $theta$ - for example the population mean or total or proportion of an attribute of the units of the population with an estimator $hat{theta}$. Then we would wish the estimate to be close to the true value with high probability.
So, specifying a maximum allowable difference $d$ between the estimate and the true value and allowing for a small probability $alpha$ that the error may exceed that difference, the challenge is to choose a sample size $n$ such that
begin{align*}
P(|hat{theta}-theta|>d)<alphatag{2}
end{align*}
If the estimator $hat{theta}$ is an unbiased, normally distributed estimator of $theta$, then $frac{hat{theta}-theta}{sqrt{mathop{var}(hat{theta})}}$ has a standard normal distribution. Letting $z$ denote the upper $frac{alpha}{2}$ point of the standard normal distribution yields
begin{align*}
Pleft(frac{|hat{theta}-theta|}{sqrt{mathop{var}(hat{theta})}}>zright)
=Pleft(|hat{theta}-theta|>zsqrt{mathop{var}(hat{theta})}right)=alpha
end{align*}
Now, since $d$ and the expression (2) provide us with a precise idea of accuracy, we are ready to harvest.
Observe, that the variance of the estimator $hat{theta}$ decreases with an increasing sample size $n$, so that the inequality above will be satisfied if we can choose $n$ large enough to make
begin{align*}
zsqrt{mathop{var}(hat{theta})}leq dtag{3}
end{align*}
These are the relevant parameters to deal with accuracy. Next we consider
Sample size $n$ for estimating a proportion:
To obtain an estimator $hat{p}$ having probability at least $1-alpha$ of being no farther then $d$ from the population proportion, the sample size formula based on the normal approximation gives according to (1) and (3)
begin{align*}
mathop{var}(hat{theta})&=frac{d^2}{z^2}\
frac{N-n}{N-1}cdotfrac{p(1-p)}{n}&=frac{d^2}{z^2}\
end{align*}
We obtain by setting $n_0=frac{z^2}{d^2}p(1-p)$
begin{align*}
n=frac{1}{frac{N-1}{N}cdotfrac{1}{n_0}+frac{1}{N}}tag{3}
end{align*}
Note that the formula depends on the unknown population proportion $p$. Since no estimate of $p$ is available, a worst-case value of $p=frac{1}{2}$ can be used in determining the sample size. This approach is justified since the quantity $p(1-p)$, and hence the value of $n$ assumes its maximum value when $p=frac{1}{2}$.
Note: When $N$ is large compared with the sample size $n$ then formula (3) reduces to
begin{align*}
n&simeq limlimits_{Nrightarrow infty}frac{1}{frac{N-1}{N}cdotfrac{1}{n_0}+frac{1}{N}}=n_0
end{align*}
Since then $n=n_0$ we obtain
begin{align*}
n=frac{z^2}{d^2}p(1-p)tag{4}
end{align*}
and we see in accordance with OPs lecturer, that in case the sample size $n$ is small compared with the population size the accuracy $d$ depends on the sample only.
With regard to one of OPs questions I'm not aware of a specific term for this circumstance. But, sometimes this is named finite population correction.
Which scenario is more accurate:
To answer this question we now transform (3) to obtain the difference $d$
begin{align*}
d=zsqrt{frac{N-n}{(N-1)n}p(1-p)}
end{align*}
Assuming an estimation for the true proportion with probability $0.95$ ($alpha=0.05$) and taking the worst-case probability $p=0.5$ we obtain the formula
begin{align*}
d=1.96sqrt{frac{N-n}{(N-1)n}cdotfrac{1}{2}cdotfrac{1}{2}}=0.98sqrt{frac{N-n}{(N-1)n}}
end{align*}
We observe in case 1: $N=1000, n=100$
begin{align*}
d=0.98sqrt{frac{900}{999cdot100}}simeq 0.0930
end{align*}
and in case 2: $N=100000, n=1000$
begin{align*}
d=0.98sqrt{frac{999000}{999999cdot1090}}simeq 0.0310
end{align*}
and conclude, that the accuracy of case 2 is greater than that of case 1 provided the interpretation is according to the modeling above.
Note: This answer is mostly based upon Sampling, chapter 5: Estimating Proportions, Ratios and Subpopulation Means by Steven K. Thompson.
$endgroup$
Note: For convenience only I use in the following $N$ for the size of the population and $n$ for the sample size.
In order to answer OPs questions we start with some preliminary work and describe the current situation in somewhat more detail.
Current situation:
Here we have a simple random sampling, meaning that every possible combination of $n$ units from a population of size $N$ is equally likely to be the sample selected.
We are in a sampling situation where the object is to estimate the proportion of units in a population having some attributes. In such a situation, the variable of interest is an indicator variable: $y_i=1$ if unit $i$ has the attribute, and $y_i=0$ if it does not.
Writing $p$ for the proportion in the population of size $N$ with the attribute
begin{align*}
p=frac{1}{N}sum_{i=1}^{N}y_i=mu
end{align*}
the finite population variance is
begin{align*}
sigma^2&=frac{sum_{i=1}^{N-1}(y_i-p)^2}{N-1}=frac{sum_{i=1}^{N-1}y_i^2-Np^2}{N-1}
=frac{Np-Np^2}{N-1}\
&=frac{N}{N-1}p(1-p)
end{align*}
Now letting $hat{p}$ denote the proportion in the sample of size $n$ with the attribute
begin{align*}
hat{p}=frac{1}{n}sum_{i=1}^n{y_i}=bar{y}
end{align*}
the sample variance is
begin{align*}
s^2&=frac{sum_{i=1}^{n-1}(y_i-bar{y})^2}{n-1}=frac{sum_{i=1}^{n-1}y_i^2-nhat{p}^2}{n-1}\
&=frac{n}{n-1}hat{p}(1-hat{p})\
end{align*}
Note the sample proportion is the sample mean of a simple random sample, it is unbiased for the population proportion and has variance
begin{align*}
mathop{var}(hat{p})=frac{N-n}{N-1}cdotfrac{p(1-p)}{n}tag{1}
end{align*}
Before we can answer OPs questions we have to do some general
Considerations regarding accuracy:
Suppose that one wishes to estimate a population parameter $theta$ - for example the population mean or total or proportion of an attribute of the units of the population with an estimator $hat{theta}$. Then we would wish the estimate to be close to the true value with high probability.
So, specifying a maximum allowable difference $d$ between the estimate and the true value and allowing for a small probability $alpha$ that the error may exceed that difference, the challenge is to choose a sample size $n$ such that
begin{align*}
P(|hat{theta}-theta|>d)<alphatag{2}
end{align*}
If the estimator $hat{theta}$ is an unbiased, normally distributed estimator of $theta$, then $frac{hat{theta}-theta}{sqrt{mathop{var}(hat{theta})}}$ has a standard normal distribution. Letting $z$ denote the upper $frac{alpha}{2}$ point of the standard normal distribution yields
begin{align*}
Pleft(frac{|hat{theta}-theta|}{sqrt{mathop{var}(hat{theta})}}>zright)
=Pleft(|hat{theta}-theta|>zsqrt{mathop{var}(hat{theta})}right)=alpha
end{align*}
Now, since $d$ and the expression (2) provide us with a precise idea of accuracy, we are ready to harvest.
Observe, that the variance of the estimator $hat{theta}$ decreases with an increasing sample size $n$, so that the inequality above will be satisfied if we can choose $n$ large enough to make
begin{align*}
zsqrt{mathop{var}(hat{theta})}leq dtag{3}
end{align*}
These are the relevant parameters to deal with accuracy. Next we consider
Sample size $n$ for estimating a proportion:
To obtain an estimator $hat{p}$ having probability at least $1-alpha$ of being no farther then $d$ from the population proportion, the sample size formula based on the normal approximation gives according to (1) and (3)
begin{align*}
mathop{var}(hat{theta})&=frac{d^2}{z^2}\
frac{N-n}{N-1}cdotfrac{p(1-p)}{n}&=frac{d^2}{z^2}\
end{align*}
We obtain by setting $n_0=frac{z^2}{d^2}p(1-p)$
begin{align*}
n=frac{1}{frac{N-1}{N}cdotfrac{1}{n_0}+frac{1}{N}}tag{3}
end{align*}
Note that the formula depends on the unknown population proportion $p$. Since no estimate of $p$ is available, a worst-case value of $p=frac{1}{2}$ can be used in determining the sample size. This approach is justified since the quantity $p(1-p)$, and hence the value of $n$ assumes its maximum value when $p=frac{1}{2}$.
Note: When $N$ is large compared with the sample size $n$ then formula (3) reduces to
begin{align*}
n&simeq limlimits_{Nrightarrow infty}frac{1}{frac{N-1}{N}cdotfrac{1}{n_0}+frac{1}{N}}=n_0
end{align*}
Since then $n=n_0$ we obtain
begin{align*}
n=frac{z^2}{d^2}p(1-p)tag{4}
end{align*}
and we see in accordance with OPs lecturer, that in case the sample size $n$ is small compared with the population size the accuracy $d$ depends on the sample only.
With regard to one of OPs questions I'm not aware of a specific term for this circumstance. But, sometimes this is named finite population correction.
Which scenario is more accurate:
To answer this question we now transform (3) to obtain the difference $d$
begin{align*}
d=zsqrt{frac{N-n}{(N-1)n}p(1-p)}
end{align*}
Assuming an estimation for the true proportion with probability $0.95$ ($alpha=0.05$) and taking the worst-case probability $p=0.5$ we obtain the formula
begin{align*}
d=1.96sqrt{frac{N-n}{(N-1)n}cdotfrac{1}{2}cdotfrac{1}{2}}=0.98sqrt{frac{N-n}{(N-1)n}}
end{align*}
We observe in case 1: $N=1000, n=100$
begin{align*}
d=0.98sqrt{frac{900}{999cdot100}}simeq 0.0930
end{align*}
and in case 2: $N=100000, n=1000$
begin{align*}
d=0.98sqrt{frac{999000}{999999cdot1090}}simeq 0.0310
end{align*}
and conclude, that the accuracy of case 2 is greater than that of case 1 provided the interpretation is according to the modeling above.
Note: This answer is mostly based upon Sampling, chapter 5: Estimating Proportions, Ratios and Subpopulation Means by Steven K. Thompson.
edited Jul 11 '15 at 17:27
answered Jul 11 '15 at 16:27


Markus ScheuerMarkus Scheuer
61.3k456146
61.3k456146
add a comment |
add a comment |
Thanks for contributing an answer to Mathematics Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f926478%2fhow-does-accuracy-of-a-survey-depend-on-sample-size-and-population-size%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
$begingroup$
Look up 'finite population correction factor'.
$endgroup$
– copper.hat
Sep 11 '14 at 23:38
$begingroup$
Population size doesn't fix the sample size . The more important thing is "How much
variability
is present in the population." For an example , if you go to buy a sack of rice , you will take the same sample size if you go to buy only 3-4 kg rice as you have the belief that the variability in the sack is same that of the variability in the 3-4 kg rice .$endgroup$
– user 31466
Jul 11 '15 at 15:21