incremental computation of standard deviation
$begingroup$
How can I compute the standard deviation in an incremental way (using the new value and the last computed mean and/or std deviation) ?
for the non incremental way, I just do something like:
mean = Mean(list)
for i = 0 to list.size
stdev = stdev + (list[i] - mean)^2
stdev = sqrRoot( stdev / list.size )
statistics algorithms computational-mathematics standard-deviation
$endgroup$
add a comment |
$begingroup$
How can I compute the standard deviation in an incremental way (using the new value and the last computed mean and/or std deviation) ?
for the non incremental way, I just do something like:
mean = Mean(list)
for i = 0 to list.size
stdev = stdev + (list[i] - mean)^2
stdev = sqrRoot( stdev / list.size )
statistics algorithms computational-mathematics standard-deviation
$endgroup$
$begingroup$
The formula for this is a special case of the sample variance decomposition formula given in O'Neill (2014) (Result 1).
$endgroup$
– Ben
Jan 31 at 3:54
add a comment |
$begingroup$
How can I compute the standard deviation in an incremental way (using the new value and the last computed mean and/or std deviation) ?
for the non incremental way, I just do something like:
mean = Mean(list)
for i = 0 to list.size
stdev = stdev + (list[i] - mean)^2
stdev = sqrRoot( stdev / list.size )
statistics algorithms computational-mathematics standard-deviation
$endgroup$
How can I compute the standard deviation in an incremental way (using the new value and the last computed mean and/or std deviation) ?
for the non incremental way, I just do something like:
mean = Mean(list)
for i = 0 to list.size
stdev = stdev + (list[i] - mean)^2
stdev = sqrRoot( stdev / list.size )
statistics algorithms computational-mathematics standard-deviation
statistics algorithms computational-mathematics standard-deviation
edited Jan 30 at 1:33


Alexander Gruber♦
20.1k25103174
20.1k25103174
asked Jan 27 '12 at 15:12
shnshn
3751415
3751415
$begingroup$
The formula for this is a special case of the sample variance decomposition formula given in O'Neill (2014) (Result 1).
$endgroup$
– Ben
Jan 31 at 3:54
add a comment |
$begingroup$
The formula for this is a special case of the sample variance decomposition formula given in O'Neill (2014) (Result 1).
$endgroup$
– Ben
Jan 31 at 3:54
$begingroup$
The formula for this is a special case of the sample variance decomposition formula given in O'Neill (2014) (Result 1).
$endgroup$
– Ben
Jan 31 at 3:54
$begingroup$
The formula for this is a special case of the sample variance decomposition formula given in O'Neill (2014) (Result 1).
$endgroup$
– Ben
Jan 31 at 3:54
add a comment |
6 Answers
6
active
oldest
votes
$begingroup$
I think the easiest way to do this is with an orthogonality trick. I'll show how to incrementally compute the variance instead of the standard deviation. Let $X_1, X_2, ...$ be an iid sequence of random variables with $bar X = n^{-1} sum_{j = 1} ^ n X_j$ and $s^2_n$ defined similarly as the $n$'th sample variance (I use a denominator of $n-1$ instead of $n$ in your picture to keep things unbiased for the variance, but you can use the same argument just adding $1$ to all the terms with $n$ in them). First write
$$
s^2_n = frac{sum_{j = 1} ^ n (X_j - bar X_n)^2}{n - 1}
= frac{sum_{j = 1} ^ n (X_j - bar X_{n - 1} + bar X_{n - 1} + bar X_n)^2}{n - 1}.
$$
Expand this to get $$
s^2_n = frac{(n - 2)s^2_{n - 1} + (n - 1) (bar X_{n - 1} - bar X_n)^2 + 2 sum_{j = 1} ^ {n - 1} (X_j - bar X_{n - 1})(bar X_{n - 1} - bar X_n) + (X_n - bar X_{n} ^ 2)}{n - 1}
$$
and it is easy to show that the summation term above is equal to $0$ which gives $$
s^2_n = frac{(n - 2)s^2_{n - 1} + (n - 1)(bar X_{n - 1} - bar X_n)^2
+ (X_n - bar X_{n})^2}{n - 1}.
$$
EDIT: I assumed you already have an incremental expression for the sample mean. It is much easier to get that: $bar X_n = n^{-1}[X_n + (n-1)bar X_{n-1}]$.
$endgroup$
18
$begingroup$
Nice answer. Note that your formula can be simplified to $$s_n^2={n-2over n-1}s^2_{n-1}+{1over n}(X_n-bar X_{n-1})^2.$$
$endgroup$
– user940
Jan 27 '12 at 18:49
1
$begingroup$
Thanks Guy for the great explaination (2). it is exactly what i was looking for. but can you please elaborate how the summation term equals zero as i seem to be missing a trick...
$endgroup$
– user55490
Jan 7 '13 at 18:09
1
$begingroup$
@user55490: $$ sum_{j = 1} ^ {n - 1} (X_j - bar X_{n - 1}) = (sum_{j = 1} ^ {n - 1} (X_j) - (n - 1) bar X_{n - 1}) = (n - 1) (bar X_{n - 1} - bar X_{n - 1})$$
$endgroup$
– Jean Hominal
Oct 26 '15 at 14:24
2
$begingroup$
See also this wikipedia post
$endgroup$
– Geoffrey De Smet
Feb 1 '17 at 9:35
1
$begingroup$
Please, note that the simplified formula of @user940 can only be used if we are computing the sample variance. If we want to use a custom DDOF (Delta Degrees Of Freedom), the formula becomes: $$s^2_n = frac{(n - 1 - d)s^2_{n - 1} + (n - 1)(bar X_{n - 1} - bar X_n)^2 + (X_n - bar X_{n})^2}{n - d}$$ Where $d$ is the DDOF, usually $d = 0$ for population variance and $d = 1$ for sample variance.
$endgroup$
– user3019105
Feb 24 '18 at 18:41
|
show 1 more comment
$begingroup$
The standard deviation is a function of the totals $T_{alpha}=sum_{i=1}^{N}x_{i}^{alpha}$ for $alpha=0,1,2$, each of which can be calculated incrementally in an obvious way. In particular, $E[X]=T_{1}/T_{0}$ and $E[X^2]=T_{2}/T_{0}$, and the standard deviation is
$$
sigma = sqrt{text{Var}[X]} = sqrt{E[X^2]-E[X]^2} = frac{1}{T_0}sqrt{T_{0}T_{2}-T_{1}^2}.
$$
By maintaining totals of higher powers ($T_{alpha}$ for $alpha ge 3$), you can derive similar "incremental" expressions for the skewness, kurtosis, and so on.
$endgroup$
1
$begingroup$
I think the @user wanted a formula that lets you incrementally calculate the standard deviation as you increase the data set in a way that is computationally efficient, not this.
$endgroup$
– guy
Jan 27 '12 at 17:52
1
$begingroup$
This is actually computationally efficient, because you just need to count the number of samples, the sum of the samples, and the sum of the squares of each sample. Once you need to "read" the standard deviation (or average,) you calculate the square root, which costs two multuplies, one subtraction, one square root, and one divide. On most modern CPUs, this is highly efficient (and very cache friendly.)
$endgroup$
– Jon Watte
Aug 1 '14 at 3:39
$begingroup$
Furthermore, this methods allows to compute easily a running standard deviation as well, which is what i was looking for. in addition to adding the new value to the sum of values and squares, remove the oldest value from them as well. Sorry for the late comment, but I found this page after googling this issue and I hope it will let future people find this more easily.
$endgroup$
– Quentin
Feb 11 '16 at 16:31
$begingroup$
I have tried to implement this in R. However, I seem to am doing something wrong as I don't get correct standard deviation values. E.g. for the three x values 1, 2, and 3 the standard deviation should be 1. T0 = 3 (1^0 + 2^0 + 3^0), T1 = 6 (1^1 + 2^1 + 3^1), T2 = 14 (1^2 + 2^2 + 3^2). The final standard deviation value according to the formula above would be 0.8165 which is different from 1. Could you tell me where I am making a mistake?
$endgroup$
– Phil
Oct 18 '18 at 16:00
1
$begingroup$
@Phil: the formula I gave, $sqrt{E[X^2]-E[X]^2}$, is the population standard deviation. Remember that the sample standard deviation differs from this by a factor of $sqrt{n/(n-1)}$. To get the sample standard deviation in your case you would multiply by $sqrt{3/2}$, giving the value of $1$ that you were expecting.
$endgroup$
– mjqxxxx
Oct 22 '18 at 18:20
add a comment |
$begingroup$
What you refer to as an incremental computation is very close to the computer scientist's notion of an online algorithm. There is in fact a well-known online algorithm for computing the variance and thus square root of a sequence of data, documented here.
$endgroup$
$begingroup$
Just for the record, the algorithm in the linked wikipedia entry was implemented in an answer two years later.
$endgroup$
– Lee David Chung Lin
Feb 14 at 14:28
add a comment |
$begingroup$
Another way of saying the above is that you need to keep a count, an incremental sum of values and an incremental sum of squares.
Let $N$ be the count of values seen so far, $S = sum_{1,N} x_i$ and $Q = sum_{1,N} x_i^2$ (where both $S$ and $Q$ are maintained incrementally).
Then any stage,
the mean is $frac{S}{N}$
and the variance is $frac{Q}{N} - left( frac{S}{N}right)^2$
An advantage of this method is that it is less prone to rounding errors after a long series of calculations.
$endgroup$
$begingroup$
This is the best way to achieve the goal - single pass calculate with fixed amount of storage required. But is it mathematically valid?
$endgroup$
– Edward Ross
Aug 25 '16 at 13:12
$begingroup$
indeed (let $X=S/N$ be mean): $s^2=1/Nsum_{i=0}^n (x_i-X)^2 = 1/Nsum_{i=0}^n (x_i^2 - 2x_iX+X^2)=1/Nsum_{i=0}^n x_i^2-1/Nsum_{i=0}^n 2x_iX+1/Nsum_{i=0}^n X^2=Q/N-2X^2+X^2=Q/N-X^2=Q/N-(S/N)^2$
$endgroup$
– dennyrolling
Apr 26 '17 at 20:52
add a comment |
$begingroup$
Forgive my poor math background, what I need is detail
!
I added my progress here for someone like me.
$$
s_n^2=frac {sum_{i=1}^{n}(x_i-bar{x}_n)^2}{n-1} \
= frac {sum_{i=1}^n(x_i - bar{x}_{n-1} + bar{x}_{n-1} - bar{x}_n)^2}{n-1} \
= frac {sum_{i=1}^{n}(x_i - bar{x}_{n-1})^2 + 2sum_{i=1}^n(x_i - bar{x}_{n-1})(bar{x}_{n-1} - bar{x}_n) + sum_{i=1}^n(bar{x}_{n-1} - bar{x}_n)^2} {n-1} \
= frac {(sum_{i=1}^{n-1}(x_i - bar{x}_{n-1})^2 + (x_n - bar{x}_{n-1})^2) + (2sum_{i=1}^{n-1}(x_i - bar{x}_{n-1})(bar{x}_{n-1} - bar{x}_n) + 2(x_n - bar{x}_{n-1})(bar{x}_{n-1} - bar{x}_n)) + sum_{i=1}^n(bar{x}_{n-1} - bar{x}_n)^2} {n-1} \
= frac {(n-2)s_{n-1}^2 + (x_n - bar{x}_{n-1})^2 + 0 + 2(x_n - bar{x}_{n-1})(bar{x}_{n-1} - bar{x}_n)) + n(bar{x}_{n-1} - bar{x}_n)^2} {n-1} \
= frac {(n-2)s_{n-1}^2 + x_n^2 - 2x_nbar{x}_{n-1} + bar{x}_{n-1}^2 + 2x_n bar{x}_{n-1} - 2x_n bar{x}_n - 2bar{x}_{n-1}^2 + 2bar{x}_{n-1}bar{x}_n + nbar{x}_{n-1}^2 - 2nbar{x}_{n-1}bar{x}_n + nbar{x}_n^2} {n-1} \
= frac {(n-2)s_{n-1}^2 + x_n^2 + bar{x}_{n-1}^2 - 2x_n bar{x}_n - 2bar{x}_{n-1}^2 + 2bar{x}_{n-1}bar{x}_n + nbar{x}_{n-1}^2 - 2nbar{x}_{n-1}bar{x}_n + nbar{x}_n^2} {n-1} \
= frac {(n-2)s_{n-1}^2 + x_n^2 - 2x_nbar{x}_{n-1} + bar{x}_{n-1}^2 + 2x_n bar{x}_{n-1} - 2x_n bar{x}_n - 2bar{x}_{n-1}^2 + 2bar{x}_{n-1}bar{x}_n + nbar{x}_{n-1}^2 - 2nbar{x}_{n-1}bar{x}_n + nbar{x}_n^2} {n-1} \
= frac {(n-2)s_{n-1}^2 + (x_n^2 - 2x_nbar{x}_n + bar{x}_n^2) + (n-1)(bar{x}_{n-1}^2 - 2bar{x}_{n-1}bar{x}_n + bar{x}_n^2)} {n-1} \
= frac {(n-2)s_{n-1}^2 + (n-1)(bar{x}_{n-1} - bar{x}_n)^2 + (x_n - bar{x}_n)^2} {n-1} \
= frac {n-2}{n-1}s_{n-1}^2 + (bar{x}_{n-1} - bar{x}_n)^2 + frac {(x_n - bar{x}_n)^2}{n-1}
$$
and
$$
(bar{x}_{n-1} - bar{x}_n)^2 + frac {(x_n - bar{x}_n)^2}{n-1} \
= (bar{x}_{n-1} - frac {x_n + (n-1)bar{x}_{n-1}}{n})^2 + frac {(x_n - frac {x_n + (n-1)bar{x}_{n-1}}{n})^2}{n-1} \
= frac {1}{n} (x_n - bar{x}_{n-1})^2
$$
so
$$
s_n^2 = frac{n-2}{n-1}s_{n-1}^2 + frac{1}{n}(x_n - bar{x}_{n-1})^2
$$
$endgroup$
$begingroup$
Awesome! Thanks!
$endgroup$
– user3019105
Feb 24 '18 at 18:33
add a comment |
$begingroup$
If it is of any help I found what seems to be a much nicer way here wikipedia - online algorithm .
The following is how I have implemented it in java. It has work very well for me. Hope it is clear enough, Please feel free to ask me questions about it.
/**
* standardDeviation() - designed to calculate the standard deviation of a data set incrementally by taking the last entered value and the previous sum of differences to the mean recorded.
* (i.e; upon adding a value to the data set this function should immediately be called)
*
* NOTE: do not call this function if the data set size it less than 2 since standard deviation cannot be calculated on a single value
* NOTE: sum_avg, sum_sd and avg are all static variables
* NOTE: on attempting to use this on another set following previous use, the static values will have to be reset**
*
* @param vector - List<Double> - data with only one additional value from previous method call
* @return updated value for the Standard deviation
*/
public static double standardDeviation(List<Double> vector)
{
double N = (double) vector.size(); //size of the data set
double oldavg = avg; //save the old average
avg = updateAverage(vector); //update the new average
if(N==2.0) //if there are only two, we calculate the standard deviation using the standard formula
{
for(double d:vector) //cycle through the 2 elements of the data set - there is no need to use a loop here, the set is quite small to just do manually
{
sum_sd += Math.pow((Math.abs(d)-avg), 2); //sum the following according to the formula
}
}
else if(N>2) //once we have calculated the base sum_std
{
double newel = (vector.get(vector.size()-1)); //get the latest addition to the data set
sum_sd = sum_sd + (newel - oldavg)*(newel-avg); //https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Online_algorithm
}
return Math.sqrt((sum_sd)/(N)); //N or N-1 depends on your choice of sample of population standard deviation
}
/**
* simplistic method for incrementally calculating the mean of a data set
*
* @param vector - List<Double> - data with only one additional value from previous method call
* @return updated value for the mean of the given data set
*/
public static double updateAverage(List<Double> vector)
{
if(vector.size()==2){
sum_avg = vector.get(vector.size()-1) + vector.get(vector.size()-2);
}
else{
sum_avg += vector.get(vector.size()-1);
}
return sum_avg/(double)vector.size();
}
$endgroup$
$begingroup$
This points to exactly same source as the the answer by @oldrinb two years earlier. The good thing here is that this is self-contained while the older one is just a link (with a keyword reference).
$endgroup$
– Lee David Chung Lin
Feb 14 at 14:25
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f102978%2fincremental-computation-of-standard-deviation%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
6 Answers
6
active
oldest
votes
6 Answers
6
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
I think the easiest way to do this is with an orthogonality trick. I'll show how to incrementally compute the variance instead of the standard deviation. Let $X_1, X_2, ...$ be an iid sequence of random variables with $bar X = n^{-1} sum_{j = 1} ^ n X_j$ and $s^2_n$ defined similarly as the $n$'th sample variance (I use a denominator of $n-1$ instead of $n$ in your picture to keep things unbiased for the variance, but you can use the same argument just adding $1$ to all the terms with $n$ in them). First write
$$
s^2_n = frac{sum_{j = 1} ^ n (X_j - bar X_n)^2}{n - 1}
= frac{sum_{j = 1} ^ n (X_j - bar X_{n - 1} + bar X_{n - 1} + bar X_n)^2}{n - 1}.
$$
Expand this to get $$
s^2_n = frac{(n - 2)s^2_{n - 1} + (n - 1) (bar X_{n - 1} - bar X_n)^2 + 2 sum_{j = 1} ^ {n - 1} (X_j - bar X_{n - 1})(bar X_{n - 1} - bar X_n) + (X_n - bar X_{n} ^ 2)}{n - 1}
$$
and it is easy to show that the summation term above is equal to $0$ which gives $$
s^2_n = frac{(n - 2)s^2_{n - 1} + (n - 1)(bar X_{n - 1} - bar X_n)^2
+ (X_n - bar X_{n})^2}{n - 1}.
$$
EDIT: I assumed you already have an incremental expression for the sample mean. It is much easier to get that: $bar X_n = n^{-1}[X_n + (n-1)bar X_{n-1}]$.
$endgroup$
18
$begingroup$
Nice answer. Note that your formula can be simplified to $$s_n^2={n-2over n-1}s^2_{n-1}+{1over n}(X_n-bar X_{n-1})^2.$$
$endgroup$
– user940
Jan 27 '12 at 18:49
1
$begingroup$
Thanks Guy for the great explaination (2). it is exactly what i was looking for. but can you please elaborate how the summation term equals zero as i seem to be missing a trick...
$endgroup$
– user55490
Jan 7 '13 at 18:09
1
$begingroup$
@user55490: $$ sum_{j = 1} ^ {n - 1} (X_j - bar X_{n - 1}) = (sum_{j = 1} ^ {n - 1} (X_j) - (n - 1) bar X_{n - 1}) = (n - 1) (bar X_{n - 1} - bar X_{n - 1})$$
$endgroup$
– Jean Hominal
Oct 26 '15 at 14:24
2
$begingroup$
See also this wikipedia post
$endgroup$
– Geoffrey De Smet
Feb 1 '17 at 9:35
1
$begingroup$
Please, note that the simplified formula of @user940 can only be used if we are computing the sample variance. If we want to use a custom DDOF (Delta Degrees Of Freedom), the formula becomes: $$s^2_n = frac{(n - 1 - d)s^2_{n - 1} + (n - 1)(bar X_{n - 1} - bar X_n)^2 + (X_n - bar X_{n})^2}{n - d}$$ Where $d$ is the DDOF, usually $d = 0$ for population variance and $d = 1$ for sample variance.
$endgroup$
– user3019105
Feb 24 '18 at 18:41
|
show 1 more comment
$begingroup$
I think the easiest way to do this is with an orthogonality trick. I'll show how to incrementally compute the variance instead of the standard deviation. Let $X_1, X_2, ...$ be an iid sequence of random variables with $bar X = n^{-1} sum_{j = 1} ^ n X_j$ and $s^2_n$ defined similarly as the $n$'th sample variance (I use a denominator of $n-1$ instead of $n$ in your picture to keep things unbiased for the variance, but you can use the same argument just adding $1$ to all the terms with $n$ in them). First write
$$
s^2_n = frac{sum_{j = 1} ^ n (X_j - bar X_n)^2}{n - 1}
= frac{sum_{j = 1} ^ n (X_j - bar X_{n - 1} + bar X_{n - 1} + bar X_n)^2}{n - 1}.
$$
Expand this to get $$
s^2_n = frac{(n - 2)s^2_{n - 1} + (n - 1) (bar X_{n - 1} - bar X_n)^2 + 2 sum_{j = 1} ^ {n - 1} (X_j - bar X_{n - 1})(bar X_{n - 1} - bar X_n) + (X_n - bar X_{n} ^ 2)}{n - 1}
$$
and it is easy to show that the summation term above is equal to $0$ which gives $$
s^2_n = frac{(n - 2)s^2_{n - 1} + (n - 1)(bar X_{n - 1} - bar X_n)^2
+ (X_n - bar X_{n})^2}{n - 1}.
$$
EDIT: I assumed you already have an incremental expression for the sample mean. It is much easier to get that: $bar X_n = n^{-1}[X_n + (n-1)bar X_{n-1}]$.
$endgroup$
18
$begingroup$
Nice answer. Note that your formula can be simplified to $$s_n^2={n-2over n-1}s^2_{n-1}+{1over n}(X_n-bar X_{n-1})^2.$$
$endgroup$
– user940
Jan 27 '12 at 18:49
1
$begingroup$
Thanks Guy for the great explaination (2). it is exactly what i was looking for. but can you please elaborate how the summation term equals zero as i seem to be missing a trick...
$endgroup$
– user55490
Jan 7 '13 at 18:09
1
$begingroup$
@user55490: $$ sum_{j = 1} ^ {n - 1} (X_j - bar X_{n - 1}) = (sum_{j = 1} ^ {n - 1} (X_j) - (n - 1) bar X_{n - 1}) = (n - 1) (bar X_{n - 1} - bar X_{n - 1})$$
$endgroup$
– Jean Hominal
Oct 26 '15 at 14:24
2
$begingroup$
See also this wikipedia post
$endgroup$
– Geoffrey De Smet
Feb 1 '17 at 9:35
1
$begingroup$
Please, note that the simplified formula of @user940 can only be used if we are computing the sample variance. If we want to use a custom DDOF (Delta Degrees Of Freedom), the formula becomes: $$s^2_n = frac{(n - 1 - d)s^2_{n - 1} + (n - 1)(bar X_{n - 1} - bar X_n)^2 + (X_n - bar X_{n})^2}{n - d}$$ Where $d$ is the DDOF, usually $d = 0$ for population variance and $d = 1$ for sample variance.
$endgroup$
– user3019105
Feb 24 '18 at 18:41
|
show 1 more comment
$begingroup$
I think the easiest way to do this is with an orthogonality trick. I'll show how to incrementally compute the variance instead of the standard deviation. Let $X_1, X_2, ...$ be an iid sequence of random variables with $bar X = n^{-1} sum_{j = 1} ^ n X_j$ and $s^2_n$ defined similarly as the $n$'th sample variance (I use a denominator of $n-1$ instead of $n$ in your picture to keep things unbiased for the variance, but you can use the same argument just adding $1$ to all the terms with $n$ in them). First write
$$
s^2_n = frac{sum_{j = 1} ^ n (X_j - bar X_n)^2}{n - 1}
= frac{sum_{j = 1} ^ n (X_j - bar X_{n - 1} + bar X_{n - 1} + bar X_n)^2}{n - 1}.
$$
Expand this to get $$
s^2_n = frac{(n - 2)s^2_{n - 1} + (n - 1) (bar X_{n - 1} - bar X_n)^2 + 2 sum_{j = 1} ^ {n - 1} (X_j - bar X_{n - 1})(bar X_{n - 1} - bar X_n) + (X_n - bar X_{n} ^ 2)}{n - 1}
$$
and it is easy to show that the summation term above is equal to $0$ which gives $$
s^2_n = frac{(n - 2)s^2_{n - 1} + (n - 1)(bar X_{n - 1} - bar X_n)^2
+ (X_n - bar X_{n})^2}{n - 1}.
$$
EDIT: I assumed you already have an incremental expression for the sample mean. It is much easier to get that: $bar X_n = n^{-1}[X_n + (n-1)bar X_{n-1}]$.
$endgroup$
I think the easiest way to do this is with an orthogonality trick. I'll show how to incrementally compute the variance instead of the standard deviation. Let $X_1, X_2, ...$ be an iid sequence of random variables with $bar X = n^{-1} sum_{j = 1} ^ n X_j$ and $s^2_n$ defined similarly as the $n$'th sample variance (I use a denominator of $n-1$ instead of $n$ in your picture to keep things unbiased for the variance, but you can use the same argument just adding $1$ to all the terms with $n$ in them). First write
$$
s^2_n = frac{sum_{j = 1} ^ n (X_j - bar X_n)^2}{n - 1}
= frac{sum_{j = 1} ^ n (X_j - bar X_{n - 1} + bar X_{n - 1} + bar X_n)^2}{n - 1}.
$$
Expand this to get $$
s^2_n = frac{(n - 2)s^2_{n - 1} + (n - 1) (bar X_{n - 1} - bar X_n)^2 + 2 sum_{j = 1} ^ {n - 1} (X_j - bar X_{n - 1})(bar X_{n - 1} - bar X_n) + (X_n - bar X_{n} ^ 2)}{n - 1}
$$
and it is easy to show that the summation term above is equal to $0$ which gives $$
s^2_n = frac{(n - 2)s^2_{n - 1} + (n - 1)(bar X_{n - 1} - bar X_n)^2
+ (X_n - bar X_{n})^2}{n - 1}.
$$
EDIT: I assumed you already have an incremental expression for the sample mean. It is much easier to get that: $bar X_n = n^{-1}[X_n + (n-1)bar X_{n-1}]$.
edited Jan 27 '12 at 18:03
answered Jan 27 '12 at 17:45
guyguy
2,62311827
2,62311827
18
$begingroup$
Nice answer. Note that your formula can be simplified to $$s_n^2={n-2over n-1}s^2_{n-1}+{1over n}(X_n-bar X_{n-1})^2.$$
$endgroup$
– user940
Jan 27 '12 at 18:49
1
$begingroup$
Thanks Guy for the great explaination (2). it is exactly what i was looking for. but can you please elaborate how the summation term equals zero as i seem to be missing a trick...
$endgroup$
– user55490
Jan 7 '13 at 18:09
1
$begingroup$
@user55490: $$ sum_{j = 1} ^ {n - 1} (X_j - bar X_{n - 1}) = (sum_{j = 1} ^ {n - 1} (X_j) - (n - 1) bar X_{n - 1}) = (n - 1) (bar X_{n - 1} - bar X_{n - 1})$$
$endgroup$
– Jean Hominal
Oct 26 '15 at 14:24
2
$begingroup$
See also this wikipedia post
$endgroup$
– Geoffrey De Smet
Feb 1 '17 at 9:35
1
$begingroup$
Please, note that the simplified formula of @user940 can only be used if we are computing the sample variance. If we want to use a custom DDOF (Delta Degrees Of Freedom), the formula becomes: $$s^2_n = frac{(n - 1 - d)s^2_{n - 1} + (n - 1)(bar X_{n - 1} - bar X_n)^2 + (X_n - bar X_{n})^2}{n - d}$$ Where $d$ is the DDOF, usually $d = 0$ for population variance and $d = 1$ for sample variance.
$endgroup$
– user3019105
Feb 24 '18 at 18:41
|
show 1 more comment
18
$begingroup$
Nice answer. Note that your formula can be simplified to $$s_n^2={n-2over n-1}s^2_{n-1}+{1over n}(X_n-bar X_{n-1})^2.$$
$endgroup$
– user940
Jan 27 '12 at 18:49
1
$begingroup$
Thanks Guy for the great explaination (2). it is exactly what i was looking for. but can you please elaborate how the summation term equals zero as i seem to be missing a trick...
$endgroup$
– user55490
Jan 7 '13 at 18:09
1
$begingroup$
@user55490: $$ sum_{j = 1} ^ {n - 1} (X_j - bar X_{n - 1}) = (sum_{j = 1} ^ {n - 1} (X_j) - (n - 1) bar X_{n - 1}) = (n - 1) (bar X_{n - 1} - bar X_{n - 1})$$
$endgroup$
– Jean Hominal
Oct 26 '15 at 14:24
2
$begingroup$
See also this wikipedia post
$endgroup$
– Geoffrey De Smet
Feb 1 '17 at 9:35
1
$begingroup$
Please, note that the simplified formula of @user940 can only be used if we are computing the sample variance. If we want to use a custom DDOF (Delta Degrees Of Freedom), the formula becomes: $$s^2_n = frac{(n - 1 - d)s^2_{n - 1} + (n - 1)(bar X_{n - 1} - bar X_n)^2 + (X_n - bar X_{n})^2}{n - d}$$ Where $d$ is the DDOF, usually $d = 0$ for population variance and $d = 1$ for sample variance.
$endgroup$
– user3019105
Feb 24 '18 at 18:41
18
18
$begingroup$
Nice answer. Note that your formula can be simplified to $$s_n^2={n-2over n-1}s^2_{n-1}+{1over n}(X_n-bar X_{n-1})^2.$$
$endgroup$
– user940
Jan 27 '12 at 18:49
$begingroup$
Nice answer. Note that your formula can be simplified to $$s_n^2={n-2over n-1}s^2_{n-1}+{1over n}(X_n-bar X_{n-1})^2.$$
$endgroup$
– user940
Jan 27 '12 at 18:49
1
1
$begingroup$
Thanks Guy for the great explaination (2). it is exactly what i was looking for. but can you please elaborate how the summation term equals zero as i seem to be missing a trick...
$endgroup$
– user55490
Jan 7 '13 at 18:09
$begingroup$
Thanks Guy for the great explaination (2). it is exactly what i was looking for. but can you please elaborate how the summation term equals zero as i seem to be missing a trick...
$endgroup$
– user55490
Jan 7 '13 at 18:09
1
1
$begingroup$
@user55490: $$ sum_{j = 1} ^ {n - 1} (X_j - bar X_{n - 1}) = (sum_{j = 1} ^ {n - 1} (X_j) - (n - 1) bar X_{n - 1}) = (n - 1) (bar X_{n - 1} - bar X_{n - 1})$$
$endgroup$
– Jean Hominal
Oct 26 '15 at 14:24
$begingroup$
@user55490: $$ sum_{j = 1} ^ {n - 1} (X_j - bar X_{n - 1}) = (sum_{j = 1} ^ {n - 1} (X_j) - (n - 1) bar X_{n - 1}) = (n - 1) (bar X_{n - 1} - bar X_{n - 1})$$
$endgroup$
– Jean Hominal
Oct 26 '15 at 14:24
2
2
$begingroup$
See also this wikipedia post
$endgroup$
– Geoffrey De Smet
Feb 1 '17 at 9:35
$begingroup$
See also this wikipedia post
$endgroup$
– Geoffrey De Smet
Feb 1 '17 at 9:35
1
1
$begingroup$
Please, note that the simplified formula of @user940 can only be used if we are computing the sample variance. If we want to use a custom DDOF (Delta Degrees Of Freedom), the formula becomes: $$s^2_n = frac{(n - 1 - d)s^2_{n - 1} + (n - 1)(bar X_{n - 1} - bar X_n)^2 + (X_n - bar X_{n})^2}{n - d}$$ Where $d$ is the DDOF, usually $d = 0$ for population variance and $d = 1$ for sample variance.
$endgroup$
– user3019105
Feb 24 '18 at 18:41
$begingroup$
Please, note that the simplified formula of @user940 can only be used if we are computing the sample variance. If we want to use a custom DDOF (Delta Degrees Of Freedom), the formula becomes: $$s^2_n = frac{(n - 1 - d)s^2_{n - 1} + (n - 1)(bar X_{n - 1} - bar X_n)^2 + (X_n - bar X_{n})^2}{n - d}$$ Where $d$ is the DDOF, usually $d = 0$ for population variance and $d = 1$ for sample variance.
$endgroup$
– user3019105
Feb 24 '18 at 18:41
|
show 1 more comment
$begingroup$
The standard deviation is a function of the totals $T_{alpha}=sum_{i=1}^{N}x_{i}^{alpha}$ for $alpha=0,1,2$, each of which can be calculated incrementally in an obvious way. In particular, $E[X]=T_{1}/T_{0}$ and $E[X^2]=T_{2}/T_{0}$, and the standard deviation is
$$
sigma = sqrt{text{Var}[X]} = sqrt{E[X^2]-E[X]^2} = frac{1}{T_0}sqrt{T_{0}T_{2}-T_{1}^2}.
$$
By maintaining totals of higher powers ($T_{alpha}$ for $alpha ge 3$), you can derive similar "incremental" expressions for the skewness, kurtosis, and so on.
$endgroup$
1
$begingroup$
I think the @user wanted a formula that lets you incrementally calculate the standard deviation as you increase the data set in a way that is computationally efficient, not this.
$endgroup$
– guy
Jan 27 '12 at 17:52
1
$begingroup$
This is actually computationally efficient, because you just need to count the number of samples, the sum of the samples, and the sum of the squares of each sample. Once you need to "read" the standard deviation (or average,) you calculate the square root, which costs two multuplies, one subtraction, one square root, and one divide. On most modern CPUs, this is highly efficient (and very cache friendly.)
$endgroup$
– Jon Watte
Aug 1 '14 at 3:39
$begingroup$
Furthermore, this methods allows to compute easily a running standard deviation as well, which is what i was looking for. in addition to adding the new value to the sum of values and squares, remove the oldest value from them as well. Sorry for the late comment, but I found this page after googling this issue and I hope it will let future people find this more easily.
$endgroup$
– Quentin
Feb 11 '16 at 16:31
$begingroup$
I have tried to implement this in R. However, I seem to am doing something wrong as I don't get correct standard deviation values. E.g. for the three x values 1, 2, and 3 the standard deviation should be 1. T0 = 3 (1^0 + 2^0 + 3^0), T1 = 6 (1^1 + 2^1 + 3^1), T2 = 14 (1^2 + 2^2 + 3^2). The final standard deviation value according to the formula above would be 0.8165 which is different from 1. Could you tell me where I am making a mistake?
$endgroup$
– Phil
Oct 18 '18 at 16:00
1
$begingroup$
@Phil: the formula I gave, $sqrt{E[X^2]-E[X]^2}$, is the population standard deviation. Remember that the sample standard deviation differs from this by a factor of $sqrt{n/(n-1)}$. To get the sample standard deviation in your case you would multiply by $sqrt{3/2}$, giving the value of $1$ that you were expecting.
$endgroup$
– mjqxxxx
Oct 22 '18 at 18:20
add a comment |
$begingroup$
The standard deviation is a function of the totals $T_{alpha}=sum_{i=1}^{N}x_{i}^{alpha}$ for $alpha=0,1,2$, each of which can be calculated incrementally in an obvious way. In particular, $E[X]=T_{1}/T_{0}$ and $E[X^2]=T_{2}/T_{0}$, and the standard deviation is
$$
sigma = sqrt{text{Var}[X]} = sqrt{E[X^2]-E[X]^2} = frac{1}{T_0}sqrt{T_{0}T_{2}-T_{1}^2}.
$$
By maintaining totals of higher powers ($T_{alpha}$ for $alpha ge 3$), you can derive similar "incremental" expressions for the skewness, kurtosis, and so on.
$endgroup$
1
$begingroup$
I think the @user wanted a formula that lets you incrementally calculate the standard deviation as you increase the data set in a way that is computationally efficient, not this.
$endgroup$
– guy
Jan 27 '12 at 17:52
1
$begingroup$
This is actually computationally efficient, because you just need to count the number of samples, the sum of the samples, and the sum of the squares of each sample. Once you need to "read" the standard deviation (or average,) you calculate the square root, which costs two multuplies, one subtraction, one square root, and one divide. On most modern CPUs, this is highly efficient (and very cache friendly.)
$endgroup$
– Jon Watte
Aug 1 '14 at 3:39
$begingroup$
Furthermore, this methods allows to compute easily a running standard deviation as well, which is what i was looking for. in addition to adding the new value to the sum of values and squares, remove the oldest value from them as well. Sorry for the late comment, but I found this page after googling this issue and I hope it will let future people find this more easily.
$endgroup$
– Quentin
Feb 11 '16 at 16:31
$begingroup$
I have tried to implement this in R. However, I seem to am doing something wrong as I don't get correct standard deviation values. E.g. for the three x values 1, 2, and 3 the standard deviation should be 1. T0 = 3 (1^0 + 2^0 + 3^0), T1 = 6 (1^1 + 2^1 + 3^1), T2 = 14 (1^2 + 2^2 + 3^2). The final standard deviation value according to the formula above would be 0.8165 which is different from 1. Could you tell me where I am making a mistake?
$endgroup$
– Phil
Oct 18 '18 at 16:00
1
$begingroup$
@Phil: the formula I gave, $sqrt{E[X^2]-E[X]^2}$, is the population standard deviation. Remember that the sample standard deviation differs from this by a factor of $sqrt{n/(n-1)}$. To get the sample standard deviation in your case you would multiply by $sqrt{3/2}$, giving the value of $1$ that you were expecting.
$endgroup$
– mjqxxxx
Oct 22 '18 at 18:20
add a comment |
$begingroup$
The standard deviation is a function of the totals $T_{alpha}=sum_{i=1}^{N}x_{i}^{alpha}$ for $alpha=0,1,2$, each of which can be calculated incrementally in an obvious way. In particular, $E[X]=T_{1}/T_{0}$ and $E[X^2]=T_{2}/T_{0}$, and the standard deviation is
$$
sigma = sqrt{text{Var}[X]} = sqrt{E[X^2]-E[X]^2} = frac{1}{T_0}sqrt{T_{0}T_{2}-T_{1}^2}.
$$
By maintaining totals of higher powers ($T_{alpha}$ for $alpha ge 3$), you can derive similar "incremental" expressions for the skewness, kurtosis, and so on.
$endgroup$
The standard deviation is a function of the totals $T_{alpha}=sum_{i=1}^{N}x_{i}^{alpha}$ for $alpha=0,1,2$, each of which can be calculated incrementally in an obvious way. In particular, $E[X]=T_{1}/T_{0}$ and $E[X^2]=T_{2}/T_{0}$, and the standard deviation is
$$
sigma = sqrt{text{Var}[X]} = sqrt{E[X^2]-E[X]^2} = frac{1}{T_0}sqrt{T_{0}T_{2}-T_{1}^2}.
$$
By maintaining totals of higher powers ($T_{alpha}$ for $alpha ge 3$), you can derive similar "incremental" expressions for the skewness, kurtosis, and so on.
answered Jan 27 '12 at 15:37
mjqxxxxmjqxxxx
31.6k24186
31.6k24186
1
$begingroup$
I think the @user wanted a formula that lets you incrementally calculate the standard deviation as you increase the data set in a way that is computationally efficient, not this.
$endgroup$
– guy
Jan 27 '12 at 17:52
1
$begingroup$
This is actually computationally efficient, because you just need to count the number of samples, the sum of the samples, and the sum of the squares of each sample. Once you need to "read" the standard deviation (or average,) you calculate the square root, which costs two multuplies, one subtraction, one square root, and one divide. On most modern CPUs, this is highly efficient (and very cache friendly.)
$endgroup$
– Jon Watte
Aug 1 '14 at 3:39
$begingroup$
Furthermore, this methods allows to compute easily a running standard deviation as well, which is what i was looking for. in addition to adding the new value to the sum of values and squares, remove the oldest value from them as well. Sorry for the late comment, but I found this page after googling this issue and I hope it will let future people find this more easily.
$endgroup$
– Quentin
Feb 11 '16 at 16:31
$begingroup$
I have tried to implement this in R. However, I seem to am doing something wrong as I don't get correct standard deviation values. E.g. for the three x values 1, 2, and 3 the standard deviation should be 1. T0 = 3 (1^0 + 2^0 + 3^0), T1 = 6 (1^1 + 2^1 + 3^1), T2 = 14 (1^2 + 2^2 + 3^2). The final standard deviation value according to the formula above would be 0.8165 which is different from 1. Could you tell me where I am making a mistake?
$endgroup$
– Phil
Oct 18 '18 at 16:00
1
$begingroup$
@Phil: the formula I gave, $sqrt{E[X^2]-E[X]^2}$, is the population standard deviation. Remember that the sample standard deviation differs from this by a factor of $sqrt{n/(n-1)}$. To get the sample standard deviation in your case you would multiply by $sqrt{3/2}$, giving the value of $1$ that you were expecting.
$endgroup$
– mjqxxxx
Oct 22 '18 at 18:20
add a comment |
1
$begingroup$
I think the @user wanted a formula that lets you incrementally calculate the standard deviation as you increase the data set in a way that is computationally efficient, not this.
$endgroup$
– guy
Jan 27 '12 at 17:52
1
$begingroup$
This is actually computationally efficient, because you just need to count the number of samples, the sum of the samples, and the sum of the squares of each sample. Once you need to "read" the standard deviation (or average,) you calculate the square root, which costs two multuplies, one subtraction, one square root, and one divide. On most modern CPUs, this is highly efficient (and very cache friendly.)
$endgroup$
– Jon Watte
Aug 1 '14 at 3:39
$begingroup$
Furthermore, this methods allows to compute easily a running standard deviation as well, which is what i was looking for. in addition to adding the new value to the sum of values and squares, remove the oldest value from them as well. Sorry for the late comment, but I found this page after googling this issue and I hope it will let future people find this more easily.
$endgroup$
– Quentin
Feb 11 '16 at 16:31
$begingroup$
I have tried to implement this in R. However, I seem to am doing something wrong as I don't get correct standard deviation values. E.g. for the three x values 1, 2, and 3 the standard deviation should be 1. T0 = 3 (1^0 + 2^0 + 3^0), T1 = 6 (1^1 + 2^1 + 3^1), T2 = 14 (1^2 + 2^2 + 3^2). The final standard deviation value according to the formula above would be 0.8165 which is different from 1. Could you tell me where I am making a mistake?
$endgroup$
– Phil
Oct 18 '18 at 16:00
1
$begingroup$
@Phil: the formula I gave, $sqrt{E[X^2]-E[X]^2}$, is the population standard deviation. Remember that the sample standard deviation differs from this by a factor of $sqrt{n/(n-1)}$. To get the sample standard deviation in your case you would multiply by $sqrt{3/2}$, giving the value of $1$ that you were expecting.
$endgroup$
– mjqxxxx
Oct 22 '18 at 18:20
1
1
$begingroup$
I think the @user wanted a formula that lets you incrementally calculate the standard deviation as you increase the data set in a way that is computationally efficient, not this.
$endgroup$
– guy
Jan 27 '12 at 17:52
$begingroup$
I think the @user wanted a formula that lets you incrementally calculate the standard deviation as you increase the data set in a way that is computationally efficient, not this.
$endgroup$
– guy
Jan 27 '12 at 17:52
1
1
$begingroup$
This is actually computationally efficient, because you just need to count the number of samples, the sum of the samples, and the sum of the squares of each sample. Once you need to "read" the standard deviation (or average,) you calculate the square root, which costs two multuplies, one subtraction, one square root, and one divide. On most modern CPUs, this is highly efficient (and very cache friendly.)
$endgroup$
– Jon Watte
Aug 1 '14 at 3:39
$begingroup$
This is actually computationally efficient, because you just need to count the number of samples, the sum of the samples, and the sum of the squares of each sample. Once you need to "read" the standard deviation (or average,) you calculate the square root, which costs two multuplies, one subtraction, one square root, and one divide. On most modern CPUs, this is highly efficient (and very cache friendly.)
$endgroup$
– Jon Watte
Aug 1 '14 at 3:39
$begingroup$
Furthermore, this methods allows to compute easily a running standard deviation as well, which is what i was looking for. in addition to adding the new value to the sum of values and squares, remove the oldest value from them as well. Sorry for the late comment, but I found this page after googling this issue and I hope it will let future people find this more easily.
$endgroup$
– Quentin
Feb 11 '16 at 16:31
$begingroup$
Furthermore, this methods allows to compute easily a running standard deviation as well, which is what i was looking for. in addition to adding the new value to the sum of values and squares, remove the oldest value from them as well. Sorry for the late comment, but I found this page after googling this issue and I hope it will let future people find this more easily.
$endgroup$
– Quentin
Feb 11 '16 at 16:31
$begingroup$
I have tried to implement this in R. However, I seem to am doing something wrong as I don't get correct standard deviation values. E.g. for the three x values 1, 2, and 3 the standard deviation should be 1. T0 = 3 (1^0 + 2^0 + 3^0), T1 = 6 (1^1 + 2^1 + 3^1), T2 = 14 (1^2 + 2^2 + 3^2). The final standard deviation value according to the formula above would be 0.8165 which is different from 1. Could you tell me where I am making a mistake?
$endgroup$
– Phil
Oct 18 '18 at 16:00
$begingroup$
I have tried to implement this in R. However, I seem to am doing something wrong as I don't get correct standard deviation values. E.g. for the three x values 1, 2, and 3 the standard deviation should be 1. T0 = 3 (1^0 + 2^0 + 3^0), T1 = 6 (1^1 + 2^1 + 3^1), T2 = 14 (1^2 + 2^2 + 3^2). The final standard deviation value according to the formula above would be 0.8165 which is different from 1. Could you tell me where I am making a mistake?
$endgroup$
– Phil
Oct 18 '18 at 16:00
1
1
$begingroup$
@Phil: the formula I gave, $sqrt{E[X^2]-E[X]^2}$, is the population standard deviation. Remember that the sample standard deviation differs from this by a factor of $sqrt{n/(n-1)}$. To get the sample standard deviation in your case you would multiply by $sqrt{3/2}$, giving the value of $1$ that you were expecting.
$endgroup$
– mjqxxxx
Oct 22 '18 at 18:20
$begingroup$
@Phil: the formula I gave, $sqrt{E[X^2]-E[X]^2}$, is the population standard deviation. Remember that the sample standard deviation differs from this by a factor of $sqrt{n/(n-1)}$. To get the sample standard deviation in your case you would multiply by $sqrt{3/2}$, giving the value of $1$ that you were expecting.
$endgroup$
– mjqxxxx
Oct 22 '18 at 18:20
add a comment |
$begingroup$
What you refer to as an incremental computation is very close to the computer scientist's notion of an online algorithm. There is in fact a well-known online algorithm for computing the variance and thus square root of a sequence of data, documented here.
$endgroup$
$begingroup$
Just for the record, the algorithm in the linked wikipedia entry was implemented in an answer two years later.
$endgroup$
– Lee David Chung Lin
Feb 14 at 14:28
add a comment |
$begingroup$
What you refer to as an incremental computation is very close to the computer scientist's notion of an online algorithm. There is in fact a well-known online algorithm for computing the variance and thus square root of a sequence of data, documented here.
$endgroup$
$begingroup$
Just for the record, the algorithm in the linked wikipedia entry was implemented in an answer two years later.
$endgroup$
– Lee David Chung Lin
Feb 14 at 14:28
add a comment |
$begingroup$
What you refer to as an incremental computation is very close to the computer scientist's notion of an online algorithm. There is in fact a well-known online algorithm for computing the variance and thus square root of a sequence of data, documented here.
$endgroup$
What you refer to as an incremental computation is very close to the computer scientist's notion of an online algorithm. There is in fact a well-known online algorithm for computing the variance and thus square root of a sequence of data, documented here.
answered Jul 31 '15 at 2:36
oldrinboldrinb
4,5801622
4,5801622
$begingroup$
Just for the record, the algorithm in the linked wikipedia entry was implemented in an answer two years later.
$endgroup$
– Lee David Chung Lin
Feb 14 at 14:28
add a comment |
$begingroup$
Just for the record, the algorithm in the linked wikipedia entry was implemented in an answer two years later.
$endgroup$
– Lee David Chung Lin
Feb 14 at 14:28
$begingroup$
Just for the record, the algorithm in the linked wikipedia entry was implemented in an answer two years later.
$endgroup$
– Lee David Chung Lin
Feb 14 at 14:28
$begingroup$
Just for the record, the algorithm in the linked wikipedia entry was implemented in an answer two years later.
$endgroup$
– Lee David Chung Lin
Feb 14 at 14:28
add a comment |
$begingroup$
Another way of saying the above is that you need to keep a count, an incremental sum of values and an incremental sum of squares.
Let $N$ be the count of values seen so far, $S = sum_{1,N} x_i$ and $Q = sum_{1,N} x_i^2$ (where both $S$ and $Q$ are maintained incrementally).
Then any stage,
the mean is $frac{S}{N}$
and the variance is $frac{Q}{N} - left( frac{S}{N}right)^2$
An advantage of this method is that it is less prone to rounding errors after a long series of calculations.
$endgroup$
$begingroup$
This is the best way to achieve the goal - single pass calculate with fixed amount of storage required. But is it mathematically valid?
$endgroup$
– Edward Ross
Aug 25 '16 at 13:12
$begingroup$
indeed (let $X=S/N$ be mean): $s^2=1/Nsum_{i=0}^n (x_i-X)^2 = 1/Nsum_{i=0}^n (x_i^2 - 2x_iX+X^2)=1/Nsum_{i=0}^n x_i^2-1/Nsum_{i=0}^n 2x_iX+1/Nsum_{i=0}^n X^2=Q/N-2X^2+X^2=Q/N-X^2=Q/N-(S/N)^2$
$endgroup$
– dennyrolling
Apr 26 '17 at 20:52
add a comment |
$begingroup$
Another way of saying the above is that you need to keep a count, an incremental sum of values and an incremental sum of squares.
Let $N$ be the count of values seen so far, $S = sum_{1,N} x_i$ and $Q = sum_{1,N} x_i^2$ (where both $S$ and $Q$ are maintained incrementally).
Then any stage,
the mean is $frac{S}{N}$
and the variance is $frac{Q}{N} - left( frac{S}{N}right)^2$
An advantage of this method is that it is less prone to rounding errors after a long series of calculations.
$endgroup$
$begingroup$
This is the best way to achieve the goal - single pass calculate with fixed amount of storage required. But is it mathematically valid?
$endgroup$
– Edward Ross
Aug 25 '16 at 13:12
$begingroup$
indeed (let $X=S/N$ be mean): $s^2=1/Nsum_{i=0}^n (x_i-X)^2 = 1/Nsum_{i=0}^n (x_i^2 - 2x_iX+X^2)=1/Nsum_{i=0}^n x_i^2-1/Nsum_{i=0}^n 2x_iX+1/Nsum_{i=0}^n X^2=Q/N-2X^2+X^2=Q/N-X^2=Q/N-(S/N)^2$
$endgroup$
– dennyrolling
Apr 26 '17 at 20:52
add a comment |
$begingroup$
Another way of saying the above is that you need to keep a count, an incremental sum of values and an incremental sum of squares.
Let $N$ be the count of values seen so far, $S = sum_{1,N} x_i$ and $Q = sum_{1,N} x_i^2$ (where both $S$ and $Q$ are maintained incrementally).
Then any stage,
the mean is $frac{S}{N}$
and the variance is $frac{Q}{N} - left( frac{S}{N}right)^2$
An advantage of this method is that it is less prone to rounding errors after a long series of calculations.
$endgroup$
Another way of saying the above is that you need to keep a count, an incremental sum of values and an incremental sum of squares.
Let $N$ be the count of values seen so far, $S = sum_{1,N} x_i$ and $Q = sum_{1,N} x_i^2$ (where both $S$ and $Q$ are maintained incrementally).
Then any stage,
the mean is $frac{S}{N}$
and the variance is $frac{Q}{N} - left( frac{S}{N}right)^2$
An advantage of this method is that it is less prone to rounding errors after a long series of calculations.
answered Jul 31 '15 at 2:32
Philip KilbyPhilip Kilby
5111
5111
$begingroup$
This is the best way to achieve the goal - single pass calculate with fixed amount of storage required. But is it mathematically valid?
$endgroup$
– Edward Ross
Aug 25 '16 at 13:12
$begingroup$
indeed (let $X=S/N$ be mean): $s^2=1/Nsum_{i=0}^n (x_i-X)^2 = 1/Nsum_{i=0}^n (x_i^2 - 2x_iX+X^2)=1/Nsum_{i=0}^n x_i^2-1/Nsum_{i=0}^n 2x_iX+1/Nsum_{i=0}^n X^2=Q/N-2X^2+X^2=Q/N-X^2=Q/N-(S/N)^2$
$endgroup$
– dennyrolling
Apr 26 '17 at 20:52
add a comment |
$begingroup$
This is the best way to achieve the goal - single pass calculate with fixed amount of storage required. But is it mathematically valid?
$endgroup$
– Edward Ross
Aug 25 '16 at 13:12
$begingroup$
indeed (let $X=S/N$ be mean): $s^2=1/Nsum_{i=0}^n (x_i-X)^2 = 1/Nsum_{i=0}^n (x_i^2 - 2x_iX+X^2)=1/Nsum_{i=0}^n x_i^2-1/Nsum_{i=0}^n 2x_iX+1/Nsum_{i=0}^n X^2=Q/N-2X^2+X^2=Q/N-X^2=Q/N-(S/N)^2$
$endgroup$
– dennyrolling
Apr 26 '17 at 20:52
$begingroup$
This is the best way to achieve the goal - single pass calculate with fixed amount of storage required. But is it mathematically valid?
$endgroup$
– Edward Ross
Aug 25 '16 at 13:12
$begingroup$
This is the best way to achieve the goal - single pass calculate with fixed amount of storage required. But is it mathematically valid?
$endgroup$
– Edward Ross
Aug 25 '16 at 13:12
$begingroup$
indeed (let $X=S/N$ be mean): $s^2=1/Nsum_{i=0}^n (x_i-X)^2 = 1/Nsum_{i=0}^n (x_i^2 - 2x_iX+X^2)=1/Nsum_{i=0}^n x_i^2-1/Nsum_{i=0}^n 2x_iX+1/Nsum_{i=0}^n X^2=Q/N-2X^2+X^2=Q/N-X^2=Q/N-(S/N)^2$
$endgroup$
– dennyrolling
Apr 26 '17 at 20:52
$begingroup$
indeed (let $X=S/N$ be mean): $s^2=1/Nsum_{i=0}^n (x_i-X)^2 = 1/Nsum_{i=0}^n (x_i^2 - 2x_iX+X^2)=1/Nsum_{i=0}^n x_i^2-1/Nsum_{i=0}^n 2x_iX+1/Nsum_{i=0}^n X^2=Q/N-2X^2+X^2=Q/N-X^2=Q/N-(S/N)^2$
$endgroup$
– dennyrolling
Apr 26 '17 at 20:52
add a comment |
$begingroup$
Forgive my poor math background, what I need is detail
!
I added my progress here for someone like me.
$$
s_n^2=frac {sum_{i=1}^{n}(x_i-bar{x}_n)^2}{n-1} \
= frac {sum_{i=1}^n(x_i - bar{x}_{n-1} + bar{x}_{n-1} - bar{x}_n)^2}{n-1} \
= frac {sum_{i=1}^{n}(x_i - bar{x}_{n-1})^2 + 2sum_{i=1}^n(x_i - bar{x}_{n-1})(bar{x}_{n-1} - bar{x}_n) + sum_{i=1}^n(bar{x}_{n-1} - bar{x}_n)^2} {n-1} \
= frac {(sum_{i=1}^{n-1}(x_i - bar{x}_{n-1})^2 + (x_n - bar{x}_{n-1})^2) + (2sum_{i=1}^{n-1}(x_i - bar{x}_{n-1})(bar{x}_{n-1} - bar{x}_n) + 2(x_n - bar{x}_{n-1})(bar{x}_{n-1} - bar{x}_n)) + sum_{i=1}^n(bar{x}_{n-1} - bar{x}_n)^2} {n-1} \
= frac {(n-2)s_{n-1}^2 + (x_n - bar{x}_{n-1})^2 + 0 + 2(x_n - bar{x}_{n-1})(bar{x}_{n-1} - bar{x}_n)) + n(bar{x}_{n-1} - bar{x}_n)^2} {n-1} \
= frac {(n-2)s_{n-1}^2 + x_n^2 - 2x_nbar{x}_{n-1} + bar{x}_{n-1}^2 + 2x_n bar{x}_{n-1} - 2x_n bar{x}_n - 2bar{x}_{n-1}^2 + 2bar{x}_{n-1}bar{x}_n + nbar{x}_{n-1}^2 - 2nbar{x}_{n-1}bar{x}_n + nbar{x}_n^2} {n-1} \
= frac {(n-2)s_{n-1}^2 + x_n^2 + bar{x}_{n-1}^2 - 2x_n bar{x}_n - 2bar{x}_{n-1}^2 + 2bar{x}_{n-1}bar{x}_n + nbar{x}_{n-1}^2 - 2nbar{x}_{n-1}bar{x}_n + nbar{x}_n^2} {n-1} \
= frac {(n-2)s_{n-1}^2 + x_n^2 - 2x_nbar{x}_{n-1} + bar{x}_{n-1}^2 + 2x_n bar{x}_{n-1} - 2x_n bar{x}_n - 2bar{x}_{n-1}^2 + 2bar{x}_{n-1}bar{x}_n + nbar{x}_{n-1}^2 - 2nbar{x}_{n-1}bar{x}_n + nbar{x}_n^2} {n-1} \
= frac {(n-2)s_{n-1}^2 + (x_n^2 - 2x_nbar{x}_n + bar{x}_n^2) + (n-1)(bar{x}_{n-1}^2 - 2bar{x}_{n-1}bar{x}_n + bar{x}_n^2)} {n-1} \
= frac {(n-2)s_{n-1}^2 + (n-1)(bar{x}_{n-1} - bar{x}_n)^2 + (x_n - bar{x}_n)^2} {n-1} \
= frac {n-2}{n-1}s_{n-1}^2 + (bar{x}_{n-1} - bar{x}_n)^2 + frac {(x_n - bar{x}_n)^2}{n-1}
$$
and
$$
(bar{x}_{n-1} - bar{x}_n)^2 + frac {(x_n - bar{x}_n)^2}{n-1} \
= (bar{x}_{n-1} - frac {x_n + (n-1)bar{x}_{n-1}}{n})^2 + frac {(x_n - frac {x_n + (n-1)bar{x}_{n-1}}{n})^2}{n-1} \
= frac {1}{n} (x_n - bar{x}_{n-1})^2
$$
so
$$
s_n^2 = frac{n-2}{n-1}s_{n-1}^2 + frac{1}{n}(x_n - bar{x}_{n-1})^2
$$
$endgroup$
$begingroup$
Awesome! Thanks!
$endgroup$
– user3019105
Feb 24 '18 at 18:33
add a comment |
$begingroup$
Forgive my poor math background, what I need is detail
!
I added my progress here for someone like me.
$$
s_n^2=frac {sum_{i=1}^{n}(x_i-bar{x}_n)^2}{n-1} \
= frac {sum_{i=1}^n(x_i - bar{x}_{n-1} + bar{x}_{n-1} - bar{x}_n)^2}{n-1} \
= frac {sum_{i=1}^{n}(x_i - bar{x}_{n-1})^2 + 2sum_{i=1}^n(x_i - bar{x}_{n-1})(bar{x}_{n-1} - bar{x}_n) + sum_{i=1}^n(bar{x}_{n-1} - bar{x}_n)^2} {n-1} \
= frac {(sum_{i=1}^{n-1}(x_i - bar{x}_{n-1})^2 + (x_n - bar{x}_{n-1})^2) + (2sum_{i=1}^{n-1}(x_i - bar{x}_{n-1})(bar{x}_{n-1} - bar{x}_n) + 2(x_n - bar{x}_{n-1})(bar{x}_{n-1} - bar{x}_n)) + sum_{i=1}^n(bar{x}_{n-1} - bar{x}_n)^2} {n-1} \
= frac {(n-2)s_{n-1}^2 + (x_n - bar{x}_{n-1})^2 + 0 + 2(x_n - bar{x}_{n-1})(bar{x}_{n-1} - bar{x}_n)) + n(bar{x}_{n-1} - bar{x}_n)^2} {n-1} \
= frac {(n-2)s_{n-1}^2 + x_n^2 - 2x_nbar{x}_{n-1} + bar{x}_{n-1}^2 + 2x_n bar{x}_{n-1} - 2x_n bar{x}_n - 2bar{x}_{n-1}^2 + 2bar{x}_{n-1}bar{x}_n + nbar{x}_{n-1}^2 - 2nbar{x}_{n-1}bar{x}_n + nbar{x}_n^2} {n-1} \
= frac {(n-2)s_{n-1}^2 + x_n^2 + bar{x}_{n-1}^2 - 2x_n bar{x}_n - 2bar{x}_{n-1}^2 + 2bar{x}_{n-1}bar{x}_n + nbar{x}_{n-1}^2 - 2nbar{x}_{n-1}bar{x}_n + nbar{x}_n^2} {n-1} \
= frac {(n-2)s_{n-1}^2 + x_n^2 - 2x_nbar{x}_{n-1} + bar{x}_{n-1}^2 + 2x_n bar{x}_{n-1} - 2x_n bar{x}_n - 2bar{x}_{n-1}^2 + 2bar{x}_{n-1}bar{x}_n + nbar{x}_{n-1}^2 - 2nbar{x}_{n-1}bar{x}_n + nbar{x}_n^2} {n-1} \
= frac {(n-2)s_{n-1}^2 + (x_n^2 - 2x_nbar{x}_n + bar{x}_n^2) + (n-1)(bar{x}_{n-1}^2 - 2bar{x}_{n-1}bar{x}_n + bar{x}_n^2)} {n-1} \
= frac {(n-2)s_{n-1}^2 + (n-1)(bar{x}_{n-1} - bar{x}_n)^2 + (x_n - bar{x}_n)^2} {n-1} \
= frac {n-2}{n-1}s_{n-1}^2 + (bar{x}_{n-1} - bar{x}_n)^2 + frac {(x_n - bar{x}_n)^2}{n-1}
$$
and
$$
(bar{x}_{n-1} - bar{x}_n)^2 + frac {(x_n - bar{x}_n)^2}{n-1} \
= (bar{x}_{n-1} - frac {x_n + (n-1)bar{x}_{n-1}}{n})^2 + frac {(x_n - frac {x_n + (n-1)bar{x}_{n-1}}{n})^2}{n-1} \
= frac {1}{n} (x_n - bar{x}_{n-1})^2
$$
so
$$
s_n^2 = frac{n-2}{n-1}s_{n-1}^2 + frac{1}{n}(x_n - bar{x}_{n-1})^2
$$
$endgroup$
$begingroup$
Awesome! Thanks!
$endgroup$
– user3019105
Feb 24 '18 at 18:33
add a comment |
$begingroup$
Forgive my poor math background, what I need is detail
!
I added my progress here for someone like me.
$$
s_n^2=frac {sum_{i=1}^{n}(x_i-bar{x}_n)^2}{n-1} \
= frac {sum_{i=1}^n(x_i - bar{x}_{n-1} + bar{x}_{n-1} - bar{x}_n)^2}{n-1} \
= frac {sum_{i=1}^{n}(x_i - bar{x}_{n-1})^2 + 2sum_{i=1}^n(x_i - bar{x}_{n-1})(bar{x}_{n-1} - bar{x}_n) + sum_{i=1}^n(bar{x}_{n-1} - bar{x}_n)^2} {n-1} \
= frac {(sum_{i=1}^{n-1}(x_i - bar{x}_{n-1})^2 + (x_n - bar{x}_{n-1})^2) + (2sum_{i=1}^{n-1}(x_i - bar{x}_{n-1})(bar{x}_{n-1} - bar{x}_n) + 2(x_n - bar{x}_{n-1})(bar{x}_{n-1} - bar{x}_n)) + sum_{i=1}^n(bar{x}_{n-1} - bar{x}_n)^2} {n-1} \
= frac {(n-2)s_{n-1}^2 + (x_n - bar{x}_{n-1})^2 + 0 + 2(x_n - bar{x}_{n-1})(bar{x}_{n-1} - bar{x}_n)) + n(bar{x}_{n-1} - bar{x}_n)^2} {n-1} \
= frac {(n-2)s_{n-1}^2 + x_n^2 - 2x_nbar{x}_{n-1} + bar{x}_{n-1}^2 + 2x_n bar{x}_{n-1} - 2x_n bar{x}_n - 2bar{x}_{n-1}^2 + 2bar{x}_{n-1}bar{x}_n + nbar{x}_{n-1}^2 - 2nbar{x}_{n-1}bar{x}_n + nbar{x}_n^2} {n-1} \
= frac {(n-2)s_{n-1}^2 + x_n^2 + bar{x}_{n-1}^2 - 2x_n bar{x}_n - 2bar{x}_{n-1}^2 + 2bar{x}_{n-1}bar{x}_n + nbar{x}_{n-1}^2 - 2nbar{x}_{n-1}bar{x}_n + nbar{x}_n^2} {n-1} \
= frac {(n-2)s_{n-1}^2 + x_n^2 - 2x_nbar{x}_{n-1} + bar{x}_{n-1}^2 + 2x_n bar{x}_{n-1} - 2x_n bar{x}_n - 2bar{x}_{n-1}^2 + 2bar{x}_{n-1}bar{x}_n + nbar{x}_{n-1}^2 - 2nbar{x}_{n-1}bar{x}_n + nbar{x}_n^2} {n-1} \
= frac {(n-2)s_{n-1}^2 + (x_n^2 - 2x_nbar{x}_n + bar{x}_n^2) + (n-1)(bar{x}_{n-1}^2 - 2bar{x}_{n-1}bar{x}_n + bar{x}_n^2)} {n-1} \
= frac {(n-2)s_{n-1}^2 + (n-1)(bar{x}_{n-1} - bar{x}_n)^2 + (x_n - bar{x}_n)^2} {n-1} \
= frac {n-2}{n-1}s_{n-1}^2 + (bar{x}_{n-1} - bar{x}_n)^2 + frac {(x_n - bar{x}_n)^2}{n-1}
$$
and
$$
(bar{x}_{n-1} - bar{x}_n)^2 + frac {(x_n - bar{x}_n)^2}{n-1} \
= (bar{x}_{n-1} - frac {x_n + (n-1)bar{x}_{n-1}}{n})^2 + frac {(x_n - frac {x_n + (n-1)bar{x}_{n-1}}{n})^2}{n-1} \
= frac {1}{n} (x_n - bar{x}_{n-1})^2
$$
so
$$
s_n^2 = frac{n-2}{n-1}s_{n-1}^2 + frac{1}{n}(x_n - bar{x}_{n-1})^2
$$
$endgroup$
Forgive my poor math background, what I need is detail
!
I added my progress here for someone like me.
$$
s_n^2=frac {sum_{i=1}^{n}(x_i-bar{x}_n)^2}{n-1} \
= frac {sum_{i=1}^n(x_i - bar{x}_{n-1} + bar{x}_{n-1} - bar{x}_n)^2}{n-1} \
= frac {sum_{i=1}^{n}(x_i - bar{x}_{n-1})^2 + 2sum_{i=1}^n(x_i - bar{x}_{n-1})(bar{x}_{n-1} - bar{x}_n) + sum_{i=1}^n(bar{x}_{n-1} - bar{x}_n)^2} {n-1} \
= frac {(sum_{i=1}^{n-1}(x_i - bar{x}_{n-1})^2 + (x_n - bar{x}_{n-1})^2) + (2sum_{i=1}^{n-1}(x_i - bar{x}_{n-1})(bar{x}_{n-1} - bar{x}_n) + 2(x_n - bar{x}_{n-1})(bar{x}_{n-1} - bar{x}_n)) + sum_{i=1}^n(bar{x}_{n-1} - bar{x}_n)^2} {n-1} \
= frac {(n-2)s_{n-1}^2 + (x_n - bar{x}_{n-1})^2 + 0 + 2(x_n - bar{x}_{n-1})(bar{x}_{n-1} - bar{x}_n)) + n(bar{x}_{n-1} - bar{x}_n)^2} {n-1} \
= frac {(n-2)s_{n-1}^2 + x_n^2 - 2x_nbar{x}_{n-1} + bar{x}_{n-1}^2 + 2x_n bar{x}_{n-1} - 2x_n bar{x}_n - 2bar{x}_{n-1}^2 + 2bar{x}_{n-1}bar{x}_n + nbar{x}_{n-1}^2 - 2nbar{x}_{n-1}bar{x}_n + nbar{x}_n^2} {n-1} \
= frac {(n-2)s_{n-1}^2 + x_n^2 + bar{x}_{n-1}^2 - 2x_n bar{x}_n - 2bar{x}_{n-1}^2 + 2bar{x}_{n-1}bar{x}_n + nbar{x}_{n-1}^2 - 2nbar{x}_{n-1}bar{x}_n + nbar{x}_n^2} {n-1} \
= frac {(n-2)s_{n-1}^2 + x_n^2 - 2x_nbar{x}_{n-1} + bar{x}_{n-1}^2 + 2x_n bar{x}_{n-1} - 2x_n bar{x}_n - 2bar{x}_{n-1}^2 + 2bar{x}_{n-1}bar{x}_n + nbar{x}_{n-1}^2 - 2nbar{x}_{n-1}bar{x}_n + nbar{x}_n^2} {n-1} \
= frac {(n-2)s_{n-1}^2 + (x_n^2 - 2x_nbar{x}_n + bar{x}_n^2) + (n-1)(bar{x}_{n-1}^2 - 2bar{x}_{n-1}bar{x}_n + bar{x}_n^2)} {n-1} \
= frac {(n-2)s_{n-1}^2 + (n-1)(bar{x}_{n-1} - bar{x}_n)^2 + (x_n - bar{x}_n)^2} {n-1} \
= frac {n-2}{n-1}s_{n-1}^2 + (bar{x}_{n-1} - bar{x}_n)^2 + frac {(x_n - bar{x}_n)^2}{n-1}
$$
and
$$
(bar{x}_{n-1} - bar{x}_n)^2 + frac {(x_n - bar{x}_n)^2}{n-1} \
= (bar{x}_{n-1} - frac {x_n + (n-1)bar{x}_{n-1}}{n})^2 + frac {(x_n - frac {x_n + (n-1)bar{x}_{n-1}}{n})^2}{n-1} \
= frac {1}{n} (x_n - bar{x}_{n-1})^2
$$
so
$$
s_n^2 = frac{n-2}{n-1}s_{n-1}^2 + frac{1}{n}(x_n - bar{x}_{n-1})^2
$$
answered Sep 2 '17 at 11:38
secfreesecfree
1412
1412
$begingroup$
Awesome! Thanks!
$endgroup$
– user3019105
Feb 24 '18 at 18:33
add a comment |
$begingroup$
Awesome! Thanks!
$endgroup$
– user3019105
Feb 24 '18 at 18:33
$begingroup$
Awesome! Thanks!
$endgroup$
– user3019105
Feb 24 '18 at 18:33
$begingroup$
Awesome! Thanks!
$endgroup$
– user3019105
Feb 24 '18 at 18:33
add a comment |
$begingroup$
If it is of any help I found what seems to be a much nicer way here wikipedia - online algorithm .
The following is how I have implemented it in java. It has work very well for me. Hope it is clear enough, Please feel free to ask me questions about it.
/**
* standardDeviation() - designed to calculate the standard deviation of a data set incrementally by taking the last entered value and the previous sum of differences to the mean recorded.
* (i.e; upon adding a value to the data set this function should immediately be called)
*
* NOTE: do not call this function if the data set size it less than 2 since standard deviation cannot be calculated on a single value
* NOTE: sum_avg, sum_sd and avg are all static variables
* NOTE: on attempting to use this on another set following previous use, the static values will have to be reset**
*
* @param vector - List<Double> - data with only one additional value from previous method call
* @return updated value for the Standard deviation
*/
public static double standardDeviation(List<Double> vector)
{
double N = (double) vector.size(); //size of the data set
double oldavg = avg; //save the old average
avg = updateAverage(vector); //update the new average
if(N==2.0) //if there are only two, we calculate the standard deviation using the standard formula
{
for(double d:vector) //cycle through the 2 elements of the data set - there is no need to use a loop here, the set is quite small to just do manually
{
sum_sd += Math.pow((Math.abs(d)-avg), 2); //sum the following according to the formula
}
}
else if(N>2) //once we have calculated the base sum_std
{
double newel = (vector.get(vector.size()-1)); //get the latest addition to the data set
sum_sd = sum_sd + (newel - oldavg)*(newel-avg); //https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Online_algorithm
}
return Math.sqrt((sum_sd)/(N)); //N or N-1 depends on your choice of sample of population standard deviation
}
/**
* simplistic method for incrementally calculating the mean of a data set
*
* @param vector - List<Double> - data with only one additional value from previous method call
* @return updated value for the mean of the given data set
*/
public static double updateAverage(List<Double> vector)
{
if(vector.size()==2){
sum_avg = vector.get(vector.size()-1) + vector.get(vector.size()-2);
}
else{
sum_avg += vector.get(vector.size()-1);
}
return sum_avg/(double)vector.size();
}
$endgroup$
$begingroup$
This points to exactly same source as the the answer by @oldrinb two years earlier. The good thing here is that this is self-contained while the older one is just a link (with a keyword reference).
$endgroup$
– Lee David Chung Lin
Feb 14 at 14:25
add a comment |
$begingroup$
If it is of any help I found what seems to be a much nicer way here wikipedia - online algorithm .
The following is how I have implemented it in java. It has work very well for me. Hope it is clear enough, Please feel free to ask me questions about it.
/**
* standardDeviation() - designed to calculate the standard deviation of a data set incrementally by taking the last entered value and the previous sum of differences to the mean recorded.
* (i.e; upon adding a value to the data set this function should immediately be called)
*
* NOTE: do not call this function if the data set size it less than 2 since standard deviation cannot be calculated on a single value
* NOTE: sum_avg, sum_sd and avg are all static variables
* NOTE: on attempting to use this on another set following previous use, the static values will have to be reset**
*
* @param vector - List<Double> - data with only one additional value from previous method call
* @return updated value for the Standard deviation
*/
public static double standardDeviation(List<Double> vector)
{
double N = (double) vector.size(); //size of the data set
double oldavg = avg; //save the old average
avg = updateAverage(vector); //update the new average
if(N==2.0) //if there are only two, we calculate the standard deviation using the standard formula
{
for(double d:vector) //cycle through the 2 elements of the data set - there is no need to use a loop here, the set is quite small to just do manually
{
sum_sd += Math.pow((Math.abs(d)-avg), 2); //sum the following according to the formula
}
}
else if(N>2) //once we have calculated the base sum_std
{
double newel = (vector.get(vector.size()-1)); //get the latest addition to the data set
sum_sd = sum_sd + (newel - oldavg)*(newel-avg); //https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Online_algorithm
}
return Math.sqrt((sum_sd)/(N)); //N or N-1 depends on your choice of sample of population standard deviation
}
/**
* simplistic method for incrementally calculating the mean of a data set
*
* @param vector - List<Double> - data with only one additional value from previous method call
* @return updated value for the mean of the given data set
*/
public static double updateAverage(List<Double> vector)
{
if(vector.size()==2){
sum_avg = vector.get(vector.size()-1) + vector.get(vector.size()-2);
}
else{
sum_avg += vector.get(vector.size()-1);
}
return sum_avg/(double)vector.size();
}
$endgroup$
$begingroup$
This points to exactly same source as the the answer by @oldrinb two years earlier. The good thing here is that this is self-contained while the older one is just a link (with a keyword reference).
$endgroup$
– Lee David Chung Lin
Feb 14 at 14:25
add a comment |
$begingroup$
If it is of any help I found what seems to be a much nicer way here wikipedia - online algorithm .
The following is how I have implemented it in java. It has work very well for me. Hope it is clear enough, Please feel free to ask me questions about it.
/**
* standardDeviation() - designed to calculate the standard deviation of a data set incrementally by taking the last entered value and the previous sum of differences to the mean recorded.
* (i.e; upon adding a value to the data set this function should immediately be called)
*
* NOTE: do not call this function if the data set size it less than 2 since standard deviation cannot be calculated on a single value
* NOTE: sum_avg, sum_sd and avg are all static variables
* NOTE: on attempting to use this on another set following previous use, the static values will have to be reset**
*
* @param vector - List<Double> - data with only one additional value from previous method call
* @return updated value for the Standard deviation
*/
public static double standardDeviation(List<Double> vector)
{
double N = (double) vector.size(); //size of the data set
double oldavg = avg; //save the old average
avg = updateAverage(vector); //update the new average
if(N==2.0) //if there are only two, we calculate the standard deviation using the standard formula
{
for(double d:vector) //cycle through the 2 elements of the data set - there is no need to use a loop here, the set is quite small to just do manually
{
sum_sd += Math.pow((Math.abs(d)-avg), 2); //sum the following according to the formula
}
}
else if(N>2) //once we have calculated the base sum_std
{
double newel = (vector.get(vector.size()-1)); //get the latest addition to the data set
sum_sd = sum_sd + (newel - oldavg)*(newel-avg); //https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Online_algorithm
}
return Math.sqrt((sum_sd)/(N)); //N or N-1 depends on your choice of sample of population standard deviation
}
/**
* simplistic method for incrementally calculating the mean of a data set
*
* @param vector - List<Double> - data with only one additional value from previous method call
* @return updated value for the mean of the given data set
*/
public static double updateAverage(List<Double> vector)
{
if(vector.size()==2){
sum_avg = vector.get(vector.size()-1) + vector.get(vector.size()-2);
}
else{
sum_avg += vector.get(vector.size()-1);
}
return sum_avg/(double)vector.size();
}
$endgroup$
If it is of any help I found what seems to be a much nicer way here wikipedia - online algorithm .
The following is how I have implemented it in java. It has work very well for me. Hope it is clear enough, Please feel free to ask me questions about it.
/**
* standardDeviation() - designed to calculate the standard deviation of a data set incrementally by taking the last entered value and the previous sum of differences to the mean recorded.
* (i.e; upon adding a value to the data set this function should immediately be called)
*
* NOTE: do not call this function if the data set size it less than 2 since standard deviation cannot be calculated on a single value
* NOTE: sum_avg, sum_sd and avg are all static variables
* NOTE: on attempting to use this on another set following previous use, the static values will have to be reset**
*
* @param vector - List<Double> - data with only one additional value from previous method call
* @return updated value for the Standard deviation
*/
public static double standardDeviation(List<Double> vector)
{
double N = (double) vector.size(); //size of the data set
double oldavg = avg; //save the old average
avg = updateAverage(vector); //update the new average
if(N==2.0) //if there are only two, we calculate the standard deviation using the standard formula
{
for(double d:vector) //cycle through the 2 elements of the data set - there is no need to use a loop here, the set is quite small to just do manually
{
sum_sd += Math.pow((Math.abs(d)-avg), 2); //sum the following according to the formula
}
}
else if(N>2) //once we have calculated the base sum_std
{
double newel = (vector.get(vector.size()-1)); //get the latest addition to the data set
sum_sd = sum_sd + (newel - oldavg)*(newel-avg); //https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Online_algorithm
}
return Math.sqrt((sum_sd)/(N)); //N or N-1 depends on your choice of sample of population standard deviation
}
/**
* simplistic method for incrementally calculating the mean of a data set
*
* @param vector - List<Double> - data with only one additional value from previous method call
* @return updated value for the mean of the given data set
*/
public static double updateAverage(List<Double> vector)
{
if(vector.size()==2){
sum_avg = vector.get(vector.size()-1) + vector.get(vector.size()-2);
}
else{
sum_avg += vector.get(vector.size()-1);
}
return sum_avg/(double)vector.size();
}
edited Jul 12 '17 at 12:12
answered Jul 12 '17 at 12:02
KevinCorcorKevinCorcor
112
112
$begingroup$
This points to exactly same source as the the answer by @oldrinb two years earlier. The good thing here is that this is self-contained while the older one is just a link (with a keyword reference).
$endgroup$
– Lee David Chung Lin
Feb 14 at 14:25
add a comment |
$begingroup$
This points to exactly same source as the the answer by @oldrinb two years earlier. The good thing here is that this is self-contained while the older one is just a link (with a keyword reference).
$endgroup$
– Lee David Chung Lin
Feb 14 at 14:25
$begingroup$
This points to exactly same source as the the answer by @oldrinb two years earlier. The good thing here is that this is self-contained while the older one is just a link (with a keyword reference).
$endgroup$
– Lee David Chung Lin
Feb 14 at 14:25
$begingroup$
This points to exactly same source as the the answer by @oldrinb two years earlier. The good thing here is that this is self-contained while the older one is just a link (with a keyword reference).
$endgroup$
– Lee David Chung Lin
Feb 14 at 14:25
add a comment |
Thanks for contributing an answer to Mathematics Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f102978%2fincremental-computation-of-standard-deviation%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
The formula for this is a special case of the sample variance decomposition formula given in O'Neill (2014) (Result 1).
$endgroup$
– Ben
Jan 31 at 3:54