incremental computation of standard deviation












50












$begingroup$


How can I compute the standard deviation in an incremental way (using the new value and the last computed mean and/or std deviation) ?



for the non incremental way, I just do something like:



http://upload.wikimedia.org/wikipedia/en/math/c/c/7/cc79944cf31b14406acd0a8ec8498688.png



mean = Mean(list)
for i = 0 to list.size
stdev = stdev + (list[i] - mean)^2
stdev = sqrRoot( stdev / list.size )









share|cite|improve this question











$endgroup$












  • $begingroup$
    The formula for this is a special case of the sample variance decomposition formula given in O'Neill (2014) (Result 1).
    $endgroup$
    – Ben
    Jan 31 at 3:54
















50












$begingroup$


How can I compute the standard deviation in an incremental way (using the new value and the last computed mean and/or std deviation) ?



for the non incremental way, I just do something like:



http://upload.wikimedia.org/wikipedia/en/math/c/c/7/cc79944cf31b14406acd0a8ec8498688.png



mean = Mean(list)
for i = 0 to list.size
stdev = stdev + (list[i] - mean)^2
stdev = sqrRoot( stdev / list.size )









share|cite|improve this question











$endgroup$












  • $begingroup$
    The formula for this is a special case of the sample variance decomposition formula given in O'Neill (2014) (Result 1).
    $endgroup$
    – Ben
    Jan 31 at 3:54














50












50








50


19



$begingroup$


How can I compute the standard deviation in an incremental way (using the new value and the last computed mean and/or std deviation) ?



for the non incremental way, I just do something like:



http://upload.wikimedia.org/wikipedia/en/math/c/c/7/cc79944cf31b14406acd0a8ec8498688.png



mean = Mean(list)
for i = 0 to list.size
stdev = stdev + (list[i] - mean)^2
stdev = sqrRoot( stdev / list.size )









share|cite|improve this question











$endgroup$




How can I compute the standard deviation in an incremental way (using the new value and the last computed mean and/or std deviation) ?



for the non incremental way, I just do something like:



http://upload.wikimedia.org/wikipedia/en/math/c/c/7/cc79944cf31b14406acd0a8ec8498688.png



mean = Mean(list)
for i = 0 to list.size
stdev = stdev + (list[i] - mean)^2
stdev = sqrRoot( stdev / list.size )






statistics algorithms computational-mathematics standard-deviation






share|cite|improve this question















share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited Jan 30 at 1:33









Alexander Gruber

20.1k25103174




20.1k25103174










asked Jan 27 '12 at 15:12









shnshn

3751415




3751415












  • $begingroup$
    The formula for this is a special case of the sample variance decomposition formula given in O'Neill (2014) (Result 1).
    $endgroup$
    – Ben
    Jan 31 at 3:54


















  • $begingroup$
    The formula for this is a special case of the sample variance decomposition formula given in O'Neill (2014) (Result 1).
    $endgroup$
    – Ben
    Jan 31 at 3:54
















$begingroup$
The formula for this is a special case of the sample variance decomposition formula given in O'Neill (2014) (Result 1).
$endgroup$
– Ben
Jan 31 at 3:54




$begingroup$
The formula for this is a special case of the sample variance decomposition formula given in O'Neill (2014) (Result 1).
$endgroup$
– Ben
Jan 31 at 3:54










6 Answers
6






active

oldest

votes


















41












$begingroup$

I think the easiest way to do this is with an orthogonality trick. I'll show how to incrementally compute the variance instead of the standard deviation. Let $X_1, X_2, ...$ be an iid sequence of random variables with $bar X = n^{-1} sum_{j = 1} ^ n X_j$ and $s^2_n$ defined similarly as the $n$'th sample variance (I use a denominator of $n-1$ instead of $n$ in your picture to keep things unbiased for the variance, but you can use the same argument just adding $1$ to all the terms with $n$ in them). First write
$$
s^2_n = frac{sum_{j = 1} ^ n (X_j - bar X_n)^2}{n - 1}
= frac{sum_{j = 1} ^ n (X_j - bar X_{n - 1} + bar X_{n - 1} + bar X_n)^2}{n - 1}.
$$
Expand this to get $$
s^2_n = frac{(n - 2)s^2_{n - 1} + (n - 1) (bar X_{n - 1} - bar X_n)^2 + 2 sum_{j = 1} ^ {n - 1} (X_j - bar X_{n - 1})(bar X_{n - 1} - bar X_n) + (X_n - bar X_{n} ^ 2)}{n - 1}
$$
and it is easy to show that the summation term above is equal to $0$ which gives $$
s^2_n = frac{(n - 2)s^2_{n - 1} + (n - 1)(bar X_{n - 1} - bar X_n)^2
+ (X_n - bar X_{n})^2}{n - 1}.
$$



EDIT: I assumed you already have an incremental expression for the sample mean. It is much easier to get that: $bar X_n = n^{-1}[X_n + (n-1)bar X_{n-1}]$.






share|cite|improve this answer











$endgroup$









  • 18




    $begingroup$
    Nice answer. Note that your formula can be simplified to $$s_n^2={n-2over n-1}s^2_{n-1}+{1over n}(X_n-bar X_{n-1})^2.$$
    $endgroup$
    – user940
    Jan 27 '12 at 18:49






  • 1




    $begingroup$
    Thanks Guy for the great explaination (2). it is exactly what i was looking for. but can you please elaborate how the summation term equals zero as i seem to be missing a trick...
    $endgroup$
    – user55490
    Jan 7 '13 at 18:09






  • 1




    $begingroup$
    @user55490: $$ sum_{j = 1} ^ {n - 1} (X_j - bar X_{n - 1}) = (sum_{j = 1} ^ {n - 1} (X_j) - (n - 1) bar X_{n - 1}) = (n - 1) (bar X_{n - 1} - bar X_{n - 1})$$
    $endgroup$
    – Jean Hominal
    Oct 26 '15 at 14:24








  • 2




    $begingroup$
    See also this wikipedia post
    $endgroup$
    – Geoffrey De Smet
    Feb 1 '17 at 9:35






  • 1




    $begingroup$
    Please, note that the simplified formula of @user940 can only be used if we are computing the sample variance. If we want to use a custom DDOF (Delta Degrees Of Freedom), the formula becomes: $$s^2_n = frac{(n - 1 - d)s^2_{n - 1} + (n - 1)(bar X_{n - 1} - bar X_n)^2 + (X_n - bar X_{n})^2}{n - d}$$ Where $d$ is the DDOF, usually $d = 0$ for population variance and $d = 1$ for sample variance.
    $endgroup$
    – user3019105
    Feb 24 '18 at 18:41





















10












$begingroup$

The standard deviation is a function of the totals $T_{alpha}=sum_{i=1}^{N}x_{i}^{alpha}$ for $alpha=0,1,2$, each of which can be calculated incrementally in an obvious way. In particular, $E[X]=T_{1}/T_{0}$ and $E[X^2]=T_{2}/T_{0}$, and the standard deviation is
$$
sigma = sqrt{text{Var}[X]} = sqrt{E[X^2]-E[X]^2} = frac{1}{T_0}sqrt{T_{0}T_{2}-T_{1}^2}.
$$
By maintaining totals of higher powers ($T_{alpha}$ for $alpha ge 3$), you can derive similar "incremental" expressions for the skewness, kurtosis, and so on.






share|cite|improve this answer









$endgroup$









  • 1




    $begingroup$
    I think the @user wanted a formula that lets you incrementally calculate the standard deviation as you increase the data set in a way that is computationally efficient, not this.
    $endgroup$
    – guy
    Jan 27 '12 at 17:52






  • 1




    $begingroup$
    This is actually computationally efficient, because you just need to count the number of samples, the sum of the samples, and the sum of the squares of each sample. Once you need to "read" the standard deviation (or average,) you calculate the square root, which costs two multuplies, one subtraction, one square root, and one divide. On most modern CPUs, this is highly efficient (and very cache friendly.)
    $endgroup$
    – Jon Watte
    Aug 1 '14 at 3:39












  • $begingroup$
    Furthermore, this methods allows to compute easily a running standard deviation as well, which is what i was looking for. in addition to adding the new value to the sum of values and squares, remove the oldest value from them as well. Sorry for the late comment, but I found this page after googling this issue and I hope it will let future people find this more easily.
    $endgroup$
    – Quentin
    Feb 11 '16 at 16:31












  • $begingroup$
    I have tried to implement this in R. However, I seem to am doing something wrong as I don't get correct standard deviation values. E.g. for the three x values 1, 2, and 3 the standard deviation should be 1. T0 = 3 (1^0 + 2^0 + 3^0), T1 = 6 (1^1 + 2^1 + 3^1), T2 = 14 (1^2 + 2^2 + 3^2). The final standard deviation value according to the formula above would be 0.8165 which is different from 1. Could you tell me where I am making a mistake?
    $endgroup$
    – Phil
    Oct 18 '18 at 16:00






  • 1




    $begingroup$
    @Phil: the formula I gave, $sqrt{E[X^2]-E[X]^2}$, is the population standard deviation. Remember that the sample standard deviation differs from this by a factor of $sqrt{n/(n-1)}$. To get the sample standard deviation in your case you would multiply by $sqrt{3/2}$, giving the value of $1$ that you were expecting.
    $endgroup$
    – mjqxxxx
    Oct 22 '18 at 18:20



















7












$begingroup$

What you refer to as an incremental computation is very close to the computer scientist's notion of an online algorithm. There is in fact a well-known online algorithm for computing the variance and thus square root of a sequence of data, documented here.






share|cite|improve this answer









$endgroup$













  • $begingroup$
    Just for the record, the algorithm in the linked wikipedia entry was implemented in an answer two years later.
    $endgroup$
    – Lee David Chung Lin
    Feb 14 at 14:28





















5












$begingroup$

Another way of saying the above is that you need to keep a count, an incremental sum of values and an incremental sum of squares.

Let $N$ be the count of values seen so far, $S = sum_{1,N} x_i$ and $Q = sum_{1,N} x_i^2$ (where both $S$ and $Q$ are maintained incrementally).
Then any stage,

the mean is $frac{S}{N}$

and the variance is $frac{Q}{N} - left( frac{S}{N}right)^2$

An advantage of this method is that it is less prone to rounding errors after a long series of calculations.






share|cite|improve this answer









$endgroup$













  • $begingroup$
    This is the best way to achieve the goal - single pass calculate with fixed amount of storage required. But is it mathematically valid?
    $endgroup$
    – Edward Ross
    Aug 25 '16 at 13:12










  • $begingroup$
    indeed (let $X=S/N$ be mean): $s^2=1/Nsum_{i=0}^n (x_i-X)^2 = 1/Nsum_{i=0}^n (x_i^2 - 2x_iX+X^2)=1/Nsum_{i=0}^n x_i^2-1/Nsum_{i=0}^n 2x_iX+1/Nsum_{i=0}^n X^2=Q/N-2X^2+X^2=Q/N-X^2=Q/N-(S/N)^2$
    $endgroup$
    – dennyrolling
    Apr 26 '17 at 20:52



















4












$begingroup$

Forgive my poor math background, what I need is detail!



I added my progress here for someone like me.



$$
s_n^2=frac {sum_{i=1}^{n}(x_i-bar{x}_n)^2}{n-1} \
= frac {sum_{i=1}^n(x_i - bar{x}_{n-1} + bar{x}_{n-1} - bar{x}_n)^2}{n-1} \
= frac {sum_{i=1}^{n}(x_i - bar{x}_{n-1})^2 + 2sum_{i=1}^n(x_i - bar{x}_{n-1})(bar{x}_{n-1} - bar{x}_n) + sum_{i=1}^n(bar{x}_{n-1} - bar{x}_n)^2} {n-1} \
= frac {(sum_{i=1}^{n-1}(x_i - bar{x}_{n-1})^2 + (x_n - bar{x}_{n-1})^2) + (2sum_{i=1}^{n-1}(x_i - bar{x}_{n-1})(bar{x}_{n-1} - bar{x}_n) + 2(x_n - bar{x}_{n-1})(bar{x}_{n-1} - bar{x}_n)) + sum_{i=1}^n(bar{x}_{n-1} - bar{x}_n)^2} {n-1} \
= frac {(n-2)s_{n-1}^2 + (x_n - bar{x}_{n-1})^2 + 0 + 2(x_n - bar{x}_{n-1})(bar{x}_{n-1} - bar{x}_n)) + n(bar{x}_{n-1} - bar{x}_n)^2} {n-1} \
= frac {(n-2)s_{n-1}^2 + x_n^2 - 2x_nbar{x}_{n-1} + bar{x}_{n-1}^2 + 2x_n bar{x}_{n-1} - 2x_n bar{x}_n - 2bar{x}_{n-1}^2 + 2bar{x}_{n-1}bar{x}_n + nbar{x}_{n-1}^2 - 2nbar{x}_{n-1}bar{x}_n + nbar{x}_n^2} {n-1} \
= frac {(n-2)s_{n-1}^2 + x_n^2 + bar{x}_{n-1}^2 - 2x_n bar{x}_n - 2bar{x}_{n-1}^2 + 2bar{x}_{n-1}bar{x}_n + nbar{x}_{n-1}^2 - 2nbar{x}_{n-1}bar{x}_n + nbar{x}_n^2} {n-1} \
= frac {(n-2)s_{n-1}^2 + x_n^2 - 2x_nbar{x}_{n-1} + bar{x}_{n-1}^2 + 2x_n bar{x}_{n-1} - 2x_n bar{x}_n - 2bar{x}_{n-1}^2 + 2bar{x}_{n-1}bar{x}_n + nbar{x}_{n-1}^2 - 2nbar{x}_{n-1}bar{x}_n + nbar{x}_n^2} {n-1} \
= frac {(n-2)s_{n-1}^2 + (x_n^2 - 2x_nbar{x}_n + bar{x}_n^2) + (n-1)(bar{x}_{n-1}^2 - 2bar{x}_{n-1}bar{x}_n + bar{x}_n^2)} {n-1} \
= frac {(n-2)s_{n-1}^2 + (n-1)(bar{x}_{n-1} - bar{x}_n)^2 + (x_n - bar{x}_n)^2} {n-1} \
= frac {n-2}{n-1}s_{n-1}^2 + (bar{x}_{n-1} - bar{x}_n)^2 + frac {(x_n - bar{x}_n)^2}{n-1}
$$



and



$$
(bar{x}_{n-1} - bar{x}_n)^2 + frac {(x_n - bar{x}_n)^2}{n-1} \
= (bar{x}_{n-1} - frac {x_n + (n-1)bar{x}_{n-1}}{n})^2 + frac {(x_n - frac {x_n + (n-1)bar{x}_{n-1}}{n})^2}{n-1} \
= frac {1}{n} (x_n - bar{x}_{n-1})^2
$$



so



$$
s_n^2 = frac{n-2}{n-1}s_{n-1}^2 + frac{1}{n}(x_n - bar{x}_{n-1})^2
$$






share|cite|improve this answer









$endgroup$













  • $begingroup$
    Awesome! Thanks!
    $endgroup$
    – user3019105
    Feb 24 '18 at 18:33



















1












$begingroup$

If it is of any help I found what seems to be a much nicer way here wikipedia - online algorithm .



The following is how I have implemented it in java. It has work very well for me. Hope it is clear enough, Please feel free to ask me questions about it.







/**
* standardDeviation() - designed to calculate the standard deviation of a data set incrementally by taking the last entered value and the previous sum of differences to the mean recorded.
* (i.e; upon adding a value to the data set this function should immediately be called)
*
* NOTE: do not call this function if the data set size it less than 2 since standard deviation cannot be calculated on a single value
* NOTE: sum_avg, sum_sd and avg are all static variables
* NOTE: on attempting to use this on another set following previous use, the static values will have to be reset**
*
* @param vector - List<Double> - data with only one additional value from previous method call
* @return updated value for the Standard deviation
*/
public static double standardDeviation(List<Double> vector)
{
double N = (double) vector.size(); //size of the data set
double oldavg = avg; //save the old average
avg = updateAverage(vector); //update the new average

if(N==2.0) //if there are only two, we calculate the standard deviation using the standard formula
{
for(double d:vector) //cycle through the 2 elements of the data set - there is no need to use a loop here, the set is quite small to just do manually
{
sum_sd += Math.pow((Math.abs(d)-avg), 2); //sum the following according to the formula
}
}
else if(N>2) //once we have calculated the base sum_std
{
double newel = (vector.get(vector.size()-1)); //get the latest addition to the data set

sum_sd = sum_sd + (newel - oldavg)*(newel-avg); //https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Online_algorithm
}
return Math.sqrt((sum_sd)/(N)); //N or N-1 depends on your choice of sample of population standard deviation

}

/**
* simplistic method for incrementally calculating the mean of a data set
*
* @param vector - List<Double> - data with only one additional value from previous method call
* @return updated value for the mean of the given data set
*/
public static double updateAverage(List<Double> vector)
{
if(vector.size()==2){
sum_avg = vector.get(vector.size()-1) + vector.get(vector.size()-2);
}
else{
sum_avg += vector.get(vector.size()-1);
}


return sum_avg/(double)vector.size();

}





share|cite|improve this answer











$endgroup$













  • $begingroup$
    This points to exactly same source as the the answer by @oldrinb two years earlier. The good thing here is that this is self-contained while the older one is just a link (with a keyword reference).
    $endgroup$
    – Lee David Chung Lin
    Feb 14 at 14:25














Your Answer





StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f102978%2fincremental-computation-of-standard-deviation%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























6 Answers
6






active

oldest

votes








6 Answers
6






active

oldest

votes









active

oldest

votes






active

oldest

votes









41












$begingroup$

I think the easiest way to do this is with an orthogonality trick. I'll show how to incrementally compute the variance instead of the standard deviation. Let $X_1, X_2, ...$ be an iid sequence of random variables with $bar X = n^{-1} sum_{j = 1} ^ n X_j$ and $s^2_n$ defined similarly as the $n$'th sample variance (I use a denominator of $n-1$ instead of $n$ in your picture to keep things unbiased for the variance, but you can use the same argument just adding $1$ to all the terms with $n$ in them). First write
$$
s^2_n = frac{sum_{j = 1} ^ n (X_j - bar X_n)^2}{n - 1}
= frac{sum_{j = 1} ^ n (X_j - bar X_{n - 1} + bar X_{n - 1} + bar X_n)^2}{n - 1}.
$$
Expand this to get $$
s^2_n = frac{(n - 2)s^2_{n - 1} + (n - 1) (bar X_{n - 1} - bar X_n)^2 + 2 sum_{j = 1} ^ {n - 1} (X_j - bar X_{n - 1})(bar X_{n - 1} - bar X_n) + (X_n - bar X_{n} ^ 2)}{n - 1}
$$
and it is easy to show that the summation term above is equal to $0$ which gives $$
s^2_n = frac{(n - 2)s^2_{n - 1} + (n - 1)(bar X_{n - 1} - bar X_n)^2
+ (X_n - bar X_{n})^2}{n - 1}.
$$



EDIT: I assumed you already have an incremental expression for the sample mean. It is much easier to get that: $bar X_n = n^{-1}[X_n + (n-1)bar X_{n-1}]$.






share|cite|improve this answer











$endgroup$









  • 18




    $begingroup$
    Nice answer. Note that your formula can be simplified to $$s_n^2={n-2over n-1}s^2_{n-1}+{1over n}(X_n-bar X_{n-1})^2.$$
    $endgroup$
    – user940
    Jan 27 '12 at 18:49






  • 1




    $begingroup$
    Thanks Guy for the great explaination (2). it is exactly what i was looking for. but can you please elaborate how the summation term equals zero as i seem to be missing a trick...
    $endgroup$
    – user55490
    Jan 7 '13 at 18:09






  • 1




    $begingroup$
    @user55490: $$ sum_{j = 1} ^ {n - 1} (X_j - bar X_{n - 1}) = (sum_{j = 1} ^ {n - 1} (X_j) - (n - 1) bar X_{n - 1}) = (n - 1) (bar X_{n - 1} - bar X_{n - 1})$$
    $endgroup$
    – Jean Hominal
    Oct 26 '15 at 14:24








  • 2




    $begingroup$
    See also this wikipedia post
    $endgroup$
    – Geoffrey De Smet
    Feb 1 '17 at 9:35






  • 1




    $begingroup$
    Please, note that the simplified formula of @user940 can only be used if we are computing the sample variance. If we want to use a custom DDOF (Delta Degrees Of Freedom), the formula becomes: $$s^2_n = frac{(n - 1 - d)s^2_{n - 1} + (n - 1)(bar X_{n - 1} - bar X_n)^2 + (X_n - bar X_{n})^2}{n - d}$$ Where $d$ is the DDOF, usually $d = 0$ for population variance and $d = 1$ for sample variance.
    $endgroup$
    – user3019105
    Feb 24 '18 at 18:41


















41












$begingroup$

I think the easiest way to do this is with an orthogonality trick. I'll show how to incrementally compute the variance instead of the standard deviation. Let $X_1, X_2, ...$ be an iid sequence of random variables with $bar X = n^{-1} sum_{j = 1} ^ n X_j$ and $s^2_n$ defined similarly as the $n$'th sample variance (I use a denominator of $n-1$ instead of $n$ in your picture to keep things unbiased for the variance, but you can use the same argument just adding $1$ to all the terms with $n$ in them). First write
$$
s^2_n = frac{sum_{j = 1} ^ n (X_j - bar X_n)^2}{n - 1}
= frac{sum_{j = 1} ^ n (X_j - bar X_{n - 1} + bar X_{n - 1} + bar X_n)^2}{n - 1}.
$$
Expand this to get $$
s^2_n = frac{(n - 2)s^2_{n - 1} + (n - 1) (bar X_{n - 1} - bar X_n)^2 + 2 sum_{j = 1} ^ {n - 1} (X_j - bar X_{n - 1})(bar X_{n - 1} - bar X_n) + (X_n - bar X_{n} ^ 2)}{n - 1}
$$
and it is easy to show that the summation term above is equal to $0$ which gives $$
s^2_n = frac{(n - 2)s^2_{n - 1} + (n - 1)(bar X_{n - 1} - bar X_n)^2
+ (X_n - bar X_{n})^2}{n - 1}.
$$



EDIT: I assumed you already have an incremental expression for the sample mean. It is much easier to get that: $bar X_n = n^{-1}[X_n + (n-1)bar X_{n-1}]$.






share|cite|improve this answer











$endgroup$









  • 18




    $begingroup$
    Nice answer. Note that your formula can be simplified to $$s_n^2={n-2over n-1}s^2_{n-1}+{1over n}(X_n-bar X_{n-1})^2.$$
    $endgroup$
    – user940
    Jan 27 '12 at 18:49






  • 1




    $begingroup$
    Thanks Guy for the great explaination (2). it is exactly what i was looking for. but can you please elaborate how the summation term equals zero as i seem to be missing a trick...
    $endgroup$
    – user55490
    Jan 7 '13 at 18:09






  • 1




    $begingroup$
    @user55490: $$ sum_{j = 1} ^ {n - 1} (X_j - bar X_{n - 1}) = (sum_{j = 1} ^ {n - 1} (X_j) - (n - 1) bar X_{n - 1}) = (n - 1) (bar X_{n - 1} - bar X_{n - 1})$$
    $endgroup$
    – Jean Hominal
    Oct 26 '15 at 14:24








  • 2




    $begingroup$
    See also this wikipedia post
    $endgroup$
    – Geoffrey De Smet
    Feb 1 '17 at 9:35






  • 1




    $begingroup$
    Please, note that the simplified formula of @user940 can only be used if we are computing the sample variance. If we want to use a custom DDOF (Delta Degrees Of Freedom), the formula becomes: $$s^2_n = frac{(n - 1 - d)s^2_{n - 1} + (n - 1)(bar X_{n - 1} - bar X_n)^2 + (X_n - bar X_{n})^2}{n - d}$$ Where $d$ is the DDOF, usually $d = 0$ for population variance and $d = 1$ for sample variance.
    $endgroup$
    – user3019105
    Feb 24 '18 at 18:41
















41












41








41





$begingroup$

I think the easiest way to do this is with an orthogonality trick. I'll show how to incrementally compute the variance instead of the standard deviation. Let $X_1, X_2, ...$ be an iid sequence of random variables with $bar X = n^{-1} sum_{j = 1} ^ n X_j$ and $s^2_n$ defined similarly as the $n$'th sample variance (I use a denominator of $n-1$ instead of $n$ in your picture to keep things unbiased for the variance, but you can use the same argument just adding $1$ to all the terms with $n$ in them). First write
$$
s^2_n = frac{sum_{j = 1} ^ n (X_j - bar X_n)^2}{n - 1}
= frac{sum_{j = 1} ^ n (X_j - bar X_{n - 1} + bar X_{n - 1} + bar X_n)^2}{n - 1}.
$$
Expand this to get $$
s^2_n = frac{(n - 2)s^2_{n - 1} + (n - 1) (bar X_{n - 1} - bar X_n)^2 + 2 sum_{j = 1} ^ {n - 1} (X_j - bar X_{n - 1})(bar X_{n - 1} - bar X_n) + (X_n - bar X_{n} ^ 2)}{n - 1}
$$
and it is easy to show that the summation term above is equal to $0$ which gives $$
s^2_n = frac{(n - 2)s^2_{n - 1} + (n - 1)(bar X_{n - 1} - bar X_n)^2
+ (X_n - bar X_{n})^2}{n - 1}.
$$



EDIT: I assumed you already have an incremental expression for the sample mean. It is much easier to get that: $bar X_n = n^{-1}[X_n + (n-1)bar X_{n-1}]$.






share|cite|improve this answer











$endgroup$



I think the easiest way to do this is with an orthogonality trick. I'll show how to incrementally compute the variance instead of the standard deviation. Let $X_1, X_2, ...$ be an iid sequence of random variables with $bar X = n^{-1} sum_{j = 1} ^ n X_j$ and $s^2_n$ defined similarly as the $n$'th sample variance (I use a denominator of $n-1$ instead of $n$ in your picture to keep things unbiased for the variance, but you can use the same argument just adding $1$ to all the terms with $n$ in them). First write
$$
s^2_n = frac{sum_{j = 1} ^ n (X_j - bar X_n)^2}{n - 1}
= frac{sum_{j = 1} ^ n (X_j - bar X_{n - 1} + bar X_{n - 1} + bar X_n)^2}{n - 1}.
$$
Expand this to get $$
s^2_n = frac{(n - 2)s^2_{n - 1} + (n - 1) (bar X_{n - 1} - bar X_n)^2 + 2 sum_{j = 1} ^ {n - 1} (X_j - bar X_{n - 1})(bar X_{n - 1} - bar X_n) + (X_n - bar X_{n} ^ 2)}{n - 1}
$$
and it is easy to show that the summation term above is equal to $0$ which gives $$
s^2_n = frac{(n - 2)s^2_{n - 1} + (n - 1)(bar X_{n - 1} - bar X_n)^2
+ (X_n - bar X_{n})^2}{n - 1}.
$$



EDIT: I assumed you already have an incremental expression for the sample mean. It is much easier to get that: $bar X_n = n^{-1}[X_n + (n-1)bar X_{n-1}]$.







share|cite|improve this answer














share|cite|improve this answer



share|cite|improve this answer








edited Jan 27 '12 at 18:03

























answered Jan 27 '12 at 17:45









guyguy

2,62311827




2,62311827








  • 18




    $begingroup$
    Nice answer. Note that your formula can be simplified to $$s_n^2={n-2over n-1}s^2_{n-1}+{1over n}(X_n-bar X_{n-1})^2.$$
    $endgroup$
    – user940
    Jan 27 '12 at 18:49






  • 1




    $begingroup$
    Thanks Guy for the great explaination (2). it is exactly what i was looking for. but can you please elaborate how the summation term equals zero as i seem to be missing a trick...
    $endgroup$
    – user55490
    Jan 7 '13 at 18:09






  • 1




    $begingroup$
    @user55490: $$ sum_{j = 1} ^ {n - 1} (X_j - bar X_{n - 1}) = (sum_{j = 1} ^ {n - 1} (X_j) - (n - 1) bar X_{n - 1}) = (n - 1) (bar X_{n - 1} - bar X_{n - 1})$$
    $endgroup$
    – Jean Hominal
    Oct 26 '15 at 14:24








  • 2




    $begingroup$
    See also this wikipedia post
    $endgroup$
    – Geoffrey De Smet
    Feb 1 '17 at 9:35






  • 1




    $begingroup$
    Please, note that the simplified formula of @user940 can only be used if we are computing the sample variance. If we want to use a custom DDOF (Delta Degrees Of Freedom), the formula becomes: $$s^2_n = frac{(n - 1 - d)s^2_{n - 1} + (n - 1)(bar X_{n - 1} - bar X_n)^2 + (X_n - bar X_{n})^2}{n - d}$$ Where $d$ is the DDOF, usually $d = 0$ for population variance and $d = 1$ for sample variance.
    $endgroup$
    – user3019105
    Feb 24 '18 at 18:41
















  • 18




    $begingroup$
    Nice answer. Note that your formula can be simplified to $$s_n^2={n-2over n-1}s^2_{n-1}+{1over n}(X_n-bar X_{n-1})^2.$$
    $endgroup$
    – user940
    Jan 27 '12 at 18:49






  • 1




    $begingroup$
    Thanks Guy for the great explaination (2). it is exactly what i was looking for. but can you please elaborate how the summation term equals zero as i seem to be missing a trick...
    $endgroup$
    – user55490
    Jan 7 '13 at 18:09






  • 1




    $begingroup$
    @user55490: $$ sum_{j = 1} ^ {n - 1} (X_j - bar X_{n - 1}) = (sum_{j = 1} ^ {n - 1} (X_j) - (n - 1) bar X_{n - 1}) = (n - 1) (bar X_{n - 1} - bar X_{n - 1})$$
    $endgroup$
    – Jean Hominal
    Oct 26 '15 at 14:24








  • 2




    $begingroup$
    See also this wikipedia post
    $endgroup$
    – Geoffrey De Smet
    Feb 1 '17 at 9:35






  • 1




    $begingroup$
    Please, note that the simplified formula of @user940 can only be used if we are computing the sample variance. If we want to use a custom DDOF (Delta Degrees Of Freedom), the formula becomes: $$s^2_n = frac{(n - 1 - d)s^2_{n - 1} + (n - 1)(bar X_{n - 1} - bar X_n)^2 + (X_n - bar X_{n})^2}{n - d}$$ Where $d$ is the DDOF, usually $d = 0$ for population variance and $d = 1$ for sample variance.
    $endgroup$
    – user3019105
    Feb 24 '18 at 18:41










18




18




$begingroup$
Nice answer. Note that your formula can be simplified to $$s_n^2={n-2over n-1}s^2_{n-1}+{1over n}(X_n-bar X_{n-1})^2.$$
$endgroup$
– user940
Jan 27 '12 at 18:49




$begingroup$
Nice answer. Note that your formula can be simplified to $$s_n^2={n-2over n-1}s^2_{n-1}+{1over n}(X_n-bar X_{n-1})^2.$$
$endgroup$
– user940
Jan 27 '12 at 18:49




1




1




$begingroup$
Thanks Guy for the great explaination (2). it is exactly what i was looking for. but can you please elaborate how the summation term equals zero as i seem to be missing a trick...
$endgroup$
– user55490
Jan 7 '13 at 18:09




$begingroup$
Thanks Guy for the great explaination (2). it is exactly what i was looking for. but can you please elaborate how the summation term equals zero as i seem to be missing a trick...
$endgroup$
– user55490
Jan 7 '13 at 18:09




1




1




$begingroup$
@user55490: $$ sum_{j = 1} ^ {n - 1} (X_j - bar X_{n - 1}) = (sum_{j = 1} ^ {n - 1} (X_j) - (n - 1) bar X_{n - 1}) = (n - 1) (bar X_{n - 1} - bar X_{n - 1})$$
$endgroup$
– Jean Hominal
Oct 26 '15 at 14:24






$begingroup$
@user55490: $$ sum_{j = 1} ^ {n - 1} (X_j - bar X_{n - 1}) = (sum_{j = 1} ^ {n - 1} (X_j) - (n - 1) bar X_{n - 1}) = (n - 1) (bar X_{n - 1} - bar X_{n - 1})$$
$endgroup$
– Jean Hominal
Oct 26 '15 at 14:24






2




2




$begingroup$
See also this wikipedia post
$endgroup$
– Geoffrey De Smet
Feb 1 '17 at 9:35




$begingroup$
See also this wikipedia post
$endgroup$
– Geoffrey De Smet
Feb 1 '17 at 9:35




1




1




$begingroup$
Please, note that the simplified formula of @user940 can only be used if we are computing the sample variance. If we want to use a custom DDOF (Delta Degrees Of Freedom), the formula becomes: $$s^2_n = frac{(n - 1 - d)s^2_{n - 1} + (n - 1)(bar X_{n - 1} - bar X_n)^2 + (X_n - bar X_{n})^2}{n - d}$$ Where $d$ is the DDOF, usually $d = 0$ for population variance and $d = 1$ for sample variance.
$endgroup$
– user3019105
Feb 24 '18 at 18:41






$begingroup$
Please, note that the simplified formula of @user940 can only be used if we are computing the sample variance. If we want to use a custom DDOF (Delta Degrees Of Freedom), the formula becomes: $$s^2_n = frac{(n - 1 - d)s^2_{n - 1} + (n - 1)(bar X_{n - 1} - bar X_n)^2 + (X_n - bar X_{n})^2}{n - d}$$ Where $d$ is the DDOF, usually $d = 0$ for population variance and $d = 1$ for sample variance.
$endgroup$
– user3019105
Feb 24 '18 at 18:41













10












$begingroup$

The standard deviation is a function of the totals $T_{alpha}=sum_{i=1}^{N}x_{i}^{alpha}$ for $alpha=0,1,2$, each of which can be calculated incrementally in an obvious way. In particular, $E[X]=T_{1}/T_{0}$ and $E[X^2]=T_{2}/T_{0}$, and the standard deviation is
$$
sigma = sqrt{text{Var}[X]} = sqrt{E[X^2]-E[X]^2} = frac{1}{T_0}sqrt{T_{0}T_{2}-T_{1}^2}.
$$
By maintaining totals of higher powers ($T_{alpha}$ for $alpha ge 3$), you can derive similar "incremental" expressions for the skewness, kurtosis, and so on.






share|cite|improve this answer









$endgroup$









  • 1




    $begingroup$
    I think the @user wanted a formula that lets you incrementally calculate the standard deviation as you increase the data set in a way that is computationally efficient, not this.
    $endgroup$
    – guy
    Jan 27 '12 at 17:52






  • 1




    $begingroup$
    This is actually computationally efficient, because you just need to count the number of samples, the sum of the samples, and the sum of the squares of each sample. Once you need to "read" the standard deviation (or average,) you calculate the square root, which costs two multuplies, one subtraction, one square root, and one divide. On most modern CPUs, this is highly efficient (and very cache friendly.)
    $endgroup$
    – Jon Watte
    Aug 1 '14 at 3:39












  • $begingroup$
    Furthermore, this methods allows to compute easily a running standard deviation as well, which is what i was looking for. in addition to adding the new value to the sum of values and squares, remove the oldest value from them as well. Sorry for the late comment, but I found this page after googling this issue and I hope it will let future people find this more easily.
    $endgroup$
    – Quentin
    Feb 11 '16 at 16:31












  • $begingroup$
    I have tried to implement this in R. However, I seem to am doing something wrong as I don't get correct standard deviation values. E.g. for the three x values 1, 2, and 3 the standard deviation should be 1. T0 = 3 (1^0 + 2^0 + 3^0), T1 = 6 (1^1 + 2^1 + 3^1), T2 = 14 (1^2 + 2^2 + 3^2). The final standard deviation value according to the formula above would be 0.8165 which is different from 1. Could you tell me where I am making a mistake?
    $endgroup$
    – Phil
    Oct 18 '18 at 16:00






  • 1




    $begingroup$
    @Phil: the formula I gave, $sqrt{E[X^2]-E[X]^2}$, is the population standard deviation. Remember that the sample standard deviation differs from this by a factor of $sqrt{n/(n-1)}$. To get the sample standard deviation in your case you would multiply by $sqrt{3/2}$, giving the value of $1$ that you were expecting.
    $endgroup$
    – mjqxxxx
    Oct 22 '18 at 18:20
















10












$begingroup$

The standard deviation is a function of the totals $T_{alpha}=sum_{i=1}^{N}x_{i}^{alpha}$ for $alpha=0,1,2$, each of which can be calculated incrementally in an obvious way. In particular, $E[X]=T_{1}/T_{0}$ and $E[X^2]=T_{2}/T_{0}$, and the standard deviation is
$$
sigma = sqrt{text{Var}[X]} = sqrt{E[X^2]-E[X]^2} = frac{1}{T_0}sqrt{T_{0}T_{2}-T_{1}^2}.
$$
By maintaining totals of higher powers ($T_{alpha}$ for $alpha ge 3$), you can derive similar "incremental" expressions for the skewness, kurtosis, and so on.






share|cite|improve this answer









$endgroup$









  • 1




    $begingroup$
    I think the @user wanted a formula that lets you incrementally calculate the standard deviation as you increase the data set in a way that is computationally efficient, not this.
    $endgroup$
    – guy
    Jan 27 '12 at 17:52






  • 1




    $begingroup$
    This is actually computationally efficient, because you just need to count the number of samples, the sum of the samples, and the sum of the squares of each sample. Once you need to "read" the standard deviation (or average,) you calculate the square root, which costs two multuplies, one subtraction, one square root, and one divide. On most modern CPUs, this is highly efficient (and very cache friendly.)
    $endgroup$
    – Jon Watte
    Aug 1 '14 at 3:39












  • $begingroup$
    Furthermore, this methods allows to compute easily a running standard deviation as well, which is what i was looking for. in addition to adding the new value to the sum of values and squares, remove the oldest value from them as well. Sorry for the late comment, but I found this page after googling this issue and I hope it will let future people find this more easily.
    $endgroup$
    – Quentin
    Feb 11 '16 at 16:31












  • $begingroup$
    I have tried to implement this in R. However, I seem to am doing something wrong as I don't get correct standard deviation values. E.g. for the three x values 1, 2, and 3 the standard deviation should be 1. T0 = 3 (1^0 + 2^0 + 3^0), T1 = 6 (1^1 + 2^1 + 3^1), T2 = 14 (1^2 + 2^2 + 3^2). The final standard deviation value according to the formula above would be 0.8165 which is different from 1. Could you tell me where I am making a mistake?
    $endgroup$
    – Phil
    Oct 18 '18 at 16:00






  • 1




    $begingroup$
    @Phil: the formula I gave, $sqrt{E[X^2]-E[X]^2}$, is the population standard deviation. Remember that the sample standard deviation differs from this by a factor of $sqrt{n/(n-1)}$. To get the sample standard deviation in your case you would multiply by $sqrt{3/2}$, giving the value of $1$ that you were expecting.
    $endgroup$
    – mjqxxxx
    Oct 22 '18 at 18:20














10












10








10





$begingroup$

The standard deviation is a function of the totals $T_{alpha}=sum_{i=1}^{N}x_{i}^{alpha}$ for $alpha=0,1,2$, each of which can be calculated incrementally in an obvious way. In particular, $E[X]=T_{1}/T_{0}$ and $E[X^2]=T_{2}/T_{0}$, and the standard deviation is
$$
sigma = sqrt{text{Var}[X]} = sqrt{E[X^2]-E[X]^2} = frac{1}{T_0}sqrt{T_{0}T_{2}-T_{1}^2}.
$$
By maintaining totals of higher powers ($T_{alpha}$ for $alpha ge 3$), you can derive similar "incremental" expressions for the skewness, kurtosis, and so on.






share|cite|improve this answer









$endgroup$



The standard deviation is a function of the totals $T_{alpha}=sum_{i=1}^{N}x_{i}^{alpha}$ for $alpha=0,1,2$, each of which can be calculated incrementally in an obvious way. In particular, $E[X]=T_{1}/T_{0}$ and $E[X^2]=T_{2}/T_{0}$, and the standard deviation is
$$
sigma = sqrt{text{Var}[X]} = sqrt{E[X^2]-E[X]^2} = frac{1}{T_0}sqrt{T_{0}T_{2}-T_{1}^2}.
$$
By maintaining totals of higher powers ($T_{alpha}$ for $alpha ge 3$), you can derive similar "incremental" expressions for the skewness, kurtosis, and so on.







share|cite|improve this answer












share|cite|improve this answer



share|cite|improve this answer










answered Jan 27 '12 at 15:37









mjqxxxxmjqxxxx

31.6k24186




31.6k24186








  • 1




    $begingroup$
    I think the @user wanted a formula that lets you incrementally calculate the standard deviation as you increase the data set in a way that is computationally efficient, not this.
    $endgroup$
    – guy
    Jan 27 '12 at 17:52






  • 1




    $begingroup$
    This is actually computationally efficient, because you just need to count the number of samples, the sum of the samples, and the sum of the squares of each sample. Once you need to "read" the standard deviation (or average,) you calculate the square root, which costs two multuplies, one subtraction, one square root, and one divide. On most modern CPUs, this is highly efficient (and very cache friendly.)
    $endgroup$
    – Jon Watte
    Aug 1 '14 at 3:39












  • $begingroup$
    Furthermore, this methods allows to compute easily a running standard deviation as well, which is what i was looking for. in addition to adding the new value to the sum of values and squares, remove the oldest value from them as well. Sorry for the late comment, but I found this page after googling this issue and I hope it will let future people find this more easily.
    $endgroup$
    – Quentin
    Feb 11 '16 at 16:31












  • $begingroup$
    I have tried to implement this in R. However, I seem to am doing something wrong as I don't get correct standard deviation values. E.g. for the three x values 1, 2, and 3 the standard deviation should be 1. T0 = 3 (1^0 + 2^0 + 3^0), T1 = 6 (1^1 + 2^1 + 3^1), T2 = 14 (1^2 + 2^2 + 3^2). The final standard deviation value according to the formula above would be 0.8165 which is different from 1. Could you tell me where I am making a mistake?
    $endgroup$
    – Phil
    Oct 18 '18 at 16:00






  • 1




    $begingroup$
    @Phil: the formula I gave, $sqrt{E[X^2]-E[X]^2}$, is the population standard deviation. Remember that the sample standard deviation differs from this by a factor of $sqrt{n/(n-1)}$. To get the sample standard deviation in your case you would multiply by $sqrt{3/2}$, giving the value of $1$ that you were expecting.
    $endgroup$
    – mjqxxxx
    Oct 22 '18 at 18:20














  • 1




    $begingroup$
    I think the @user wanted a formula that lets you incrementally calculate the standard deviation as you increase the data set in a way that is computationally efficient, not this.
    $endgroup$
    – guy
    Jan 27 '12 at 17:52






  • 1




    $begingroup$
    This is actually computationally efficient, because you just need to count the number of samples, the sum of the samples, and the sum of the squares of each sample. Once you need to "read" the standard deviation (or average,) you calculate the square root, which costs two multuplies, one subtraction, one square root, and one divide. On most modern CPUs, this is highly efficient (and very cache friendly.)
    $endgroup$
    – Jon Watte
    Aug 1 '14 at 3:39












  • $begingroup$
    Furthermore, this methods allows to compute easily a running standard deviation as well, which is what i was looking for. in addition to adding the new value to the sum of values and squares, remove the oldest value from them as well. Sorry for the late comment, but I found this page after googling this issue and I hope it will let future people find this more easily.
    $endgroup$
    – Quentin
    Feb 11 '16 at 16:31












  • $begingroup$
    I have tried to implement this in R. However, I seem to am doing something wrong as I don't get correct standard deviation values. E.g. for the three x values 1, 2, and 3 the standard deviation should be 1. T0 = 3 (1^0 + 2^0 + 3^0), T1 = 6 (1^1 + 2^1 + 3^1), T2 = 14 (1^2 + 2^2 + 3^2). The final standard deviation value according to the formula above would be 0.8165 which is different from 1. Could you tell me where I am making a mistake?
    $endgroup$
    – Phil
    Oct 18 '18 at 16:00






  • 1




    $begingroup$
    @Phil: the formula I gave, $sqrt{E[X^2]-E[X]^2}$, is the population standard deviation. Remember that the sample standard deviation differs from this by a factor of $sqrt{n/(n-1)}$. To get the sample standard deviation in your case you would multiply by $sqrt{3/2}$, giving the value of $1$ that you were expecting.
    $endgroup$
    – mjqxxxx
    Oct 22 '18 at 18:20








1




1




$begingroup$
I think the @user wanted a formula that lets you incrementally calculate the standard deviation as you increase the data set in a way that is computationally efficient, not this.
$endgroup$
– guy
Jan 27 '12 at 17:52




$begingroup$
I think the @user wanted a formula that lets you incrementally calculate the standard deviation as you increase the data set in a way that is computationally efficient, not this.
$endgroup$
– guy
Jan 27 '12 at 17:52




1




1




$begingroup$
This is actually computationally efficient, because you just need to count the number of samples, the sum of the samples, and the sum of the squares of each sample. Once you need to "read" the standard deviation (or average,) you calculate the square root, which costs two multuplies, one subtraction, one square root, and one divide. On most modern CPUs, this is highly efficient (and very cache friendly.)
$endgroup$
– Jon Watte
Aug 1 '14 at 3:39






$begingroup$
This is actually computationally efficient, because you just need to count the number of samples, the sum of the samples, and the sum of the squares of each sample. Once you need to "read" the standard deviation (or average,) you calculate the square root, which costs two multuplies, one subtraction, one square root, and one divide. On most modern CPUs, this is highly efficient (and very cache friendly.)
$endgroup$
– Jon Watte
Aug 1 '14 at 3:39














$begingroup$
Furthermore, this methods allows to compute easily a running standard deviation as well, which is what i was looking for. in addition to adding the new value to the sum of values and squares, remove the oldest value from them as well. Sorry for the late comment, but I found this page after googling this issue and I hope it will let future people find this more easily.
$endgroup$
– Quentin
Feb 11 '16 at 16:31






$begingroup$
Furthermore, this methods allows to compute easily a running standard deviation as well, which is what i was looking for. in addition to adding the new value to the sum of values and squares, remove the oldest value from them as well. Sorry for the late comment, but I found this page after googling this issue and I hope it will let future people find this more easily.
$endgroup$
– Quentin
Feb 11 '16 at 16:31














$begingroup$
I have tried to implement this in R. However, I seem to am doing something wrong as I don't get correct standard deviation values. E.g. for the three x values 1, 2, and 3 the standard deviation should be 1. T0 = 3 (1^0 + 2^0 + 3^0), T1 = 6 (1^1 + 2^1 + 3^1), T2 = 14 (1^2 + 2^2 + 3^2). The final standard deviation value according to the formula above would be 0.8165 which is different from 1. Could you tell me where I am making a mistake?
$endgroup$
– Phil
Oct 18 '18 at 16:00




$begingroup$
I have tried to implement this in R. However, I seem to am doing something wrong as I don't get correct standard deviation values. E.g. for the three x values 1, 2, and 3 the standard deviation should be 1. T0 = 3 (1^0 + 2^0 + 3^0), T1 = 6 (1^1 + 2^1 + 3^1), T2 = 14 (1^2 + 2^2 + 3^2). The final standard deviation value according to the formula above would be 0.8165 which is different from 1. Could you tell me where I am making a mistake?
$endgroup$
– Phil
Oct 18 '18 at 16:00




1




1




$begingroup$
@Phil: the formula I gave, $sqrt{E[X^2]-E[X]^2}$, is the population standard deviation. Remember that the sample standard deviation differs from this by a factor of $sqrt{n/(n-1)}$. To get the sample standard deviation in your case you would multiply by $sqrt{3/2}$, giving the value of $1$ that you were expecting.
$endgroup$
– mjqxxxx
Oct 22 '18 at 18:20




$begingroup$
@Phil: the formula I gave, $sqrt{E[X^2]-E[X]^2}$, is the population standard deviation. Remember that the sample standard deviation differs from this by a factor of $sqrt{n/(n-1)}$. To get the sample standard deviation in your case you would multiply by $sqrt{3/2}$, giving the value of $1$ that you were expecting.
$endgroup$
– mjqxxxx
Oct 22 '18 at 18:20











7












$begingroup$

What you refer to as an incremental computation is very close to the computer scientist's notion of an online algorithm. There is in fact a well-known online algorithm for computing the variance and thus square root of a sequence of data, documented here.






share|cite|improve this answer









$endgroup$













  • $begingroup$
    Just for the record, the algorithm in the linked wikipedia entry was implemented in an answer two years later.
    $endgroup$
    – Lee David Chung Lin
    Feb 14 at 14:28


















7












$begingroup$

What you refer to as an incremental computation is very close to the computer scientist's notion of an online algorithm. There is in fact a well-known online algorithm for computing the variance and thus square root of a sequence of data, documented here.






share|cite|improve this answer









$endgroup$













  • $begingroup$
    Just for the record, the algorithm in the linked wikipedia entry was implemented in an answer two years later.
    $endgroup$
    – Lee David Chung Lin
    Feb 14 at 14:28
















7












7








7





$begingroup$

What you refer to as an incremental computation is very close to the computer scientist's notion of an online algorithm. There is in fact a well-known online algorithm for computing the variance and thus square root of a sequence of data, documented here.






share|cite|improve this answer









$endgroup$



What you refer to as an incremental computation is very close to the computer scientist's notion of an online algorithm. There is in fact a well-known online algorithm for computing the variance and thus square root of a sequence of data, documented here.







share|cite|improve this answer












share|cite|improve this answer



share|cite|improve this answer










answered Jul 31 '15 at 2:36









oldrinboldrinb

4,5801622




4,5801622












  • $begingroup$
    Just for the record, the algorithm in the linked wikipedia entry was implemented in an answer two years later.
    $endgroup$
    – Lee David Chung Lin
    Feb 14 at 14:28




















  • $begingroup$
    Just for the record, the algorithm in the linked wikipedia entry was implemented in an answer two years later.
    $endgroup$
    – Lee David Chung Lin
    Feb 14 at 14:28


















$begingroup$
Just for the record, the algorithm in the linked wikipedia entry was implemented in an answer two years later.
$endgroup$
– Lee David Chung Lin
Feb 14 at 14:28






$begingroup$
Just for the record, the algorithm in the linked wikipedia entry was implemented in an answer two years later.
$endgroup$
– Lee David Chung Lin
Feb 14 at 14:28













5












$begingroup$

Another way of saying the above is that you need to keep a count, an incremental sum of values and an incremental sum of squares.

Let $N$ be the count of values seen so far, $S = sum_{1,N} x_i$ and $Q = sum_{1,N} x_i^2$ (where both $S$ and $Q$ are maintained incrementally).
Then any stage,

the mean is $frac{S}{N}$

and the variance is $frac{Q}{N} - left( frac{S}{N}right)^2$

An advantage of this method is that it is less prone to rounding errors after a long series of calculations.






share|cite|improve this answer









$endgroup$













  • $begingroup$
    This is the best way to achieve the goal - single pass calculate with fixed amount of storage required. But is it mathematically valid?
    $endgroup$
    – Edward Ross
    Aug 25 '16 at 13:12










  • $begingroup$
    indeed (let $X=S/N$ be mean): $s^2=1/Nsum_{i=0}^n (x_i-X)^2 = 1/Nsum_{i=0}^n (x_i^2 - 2x_iX+X^2)=1/Nsum_{i=0}^n x_i^2-1/Nsum_{i=0}^n 2x_iX+1/Nsum_{i=0}^n X^2=Q/N-2X^2+X^2=Q/N-X^2=Q/N-(S/N)^2$
    $endgroup$
    – dennyrolling
    Apr 26 '17 at 20:52
















5












$begingroup$

Another way of saying the above is that you need to keep a count, an incremental sum of values and an incremental sum of squares.

Let $N$ be the count of values seen so far, $S = sum_{1,N} x_i$ and $Q = sum_{1,N} x_i^2$ (where both $S$ and $Q$ are maintained incrementally).
Then any stage,

the mean is $frac{S}{N}$

and the variance is $frac{Q}{N} - left( frac{S}{N}right)^2$

An advantage of this method is that it is less prone to rounding errors after a long series of calculations.






share|cite|improve this answer









$endgroup$













  • $begingroup$
    This is the best way to achieve the goal - single pass calculate with fixed amount of storage required. But is it mathematically valid?
    $endgroup$
    – Edward Ross
    Aug 25 '16 at 13:12










  • $begingroup$
    indeed (let $X=S/N$ be mean): $s^2=1/Nsum_{i=0}^n (x_i-X)^2 = 1/Nsum_{i=0}^n (x_i^2 - 2x_iX+X^2)=1/Nsum_{i=0}^n x_i^2-1/Nsum_{i=0}^n 2x_iX+1/Nsum_{i=0}^n X^2=Q/N-2X^2+X^2=Q/N-X^2=Q/N-(S/N)^2$
    $endgroup$
    – dennyrolling
    Apr 26 '17 at 20:52














5












5








5





$begingroup$

Another way of saying the above is that you need to keep a count, an incremental sum of values and an incremental sum of squares.

Let $N$ be the count of values seen so far, $S = sum_{1,N} x_i$ and $Q = sum_{1,N} x_i^2$ (where both $S$ and $Q$ are maintained incrementally).
Then any stage,

the mean is $frac{S}{N}$

and the variance is $frac{Q}{N} - left( frac{S}{N}right)^2$

An advantage of this method is that it is less prone to rounding errors after a long series of calculations.






share|cite|improve this answer









$endgroup$



Another way of saying the above is that you need to keep a count, an incremental sum of values and an incremental sum of squares.

Let $N$ be the count of values seen so far, $S = sum_{1,N} x_i$ and $Q = sum_{1,N} x_i^2$ (where both $S$ and $Q$ are maintained incrementally).
Then any stage,

the mean is $frac{S}{N}$

and the variance is $frac{Q}{N} - left( frac{S}{N}right)^2$

An advantage of this method is that it is less prone to rounding errors after a long series of calculations.







share|cite|improve this answer












share|cite|improve this answer



share|cite|improve this answer










answered Jul 31 '15 at 2:32









Philip KilbyPhilip Kilby

5111




5111












  • $begingroup$
    This is the best way to achieve the goal - single pass calculate with fixed amount of storage required. But is it mathematically valid?
    $endgroup$
    – Edward Ross
    Aug 25 '16 at 13:12










  • $begingroup$
    indeed (let $X=S/N$ be mean): $s^2=1/Nsum_{i=0}^n (x_i-X)^2 = 1/Nsum_{i=0}^n (x_i^2 - 2x_iX+X^2)=1/Nsum_{i=0}^n x_i^2-1/Nsum_{i=0}^n 2x_iX+1/Nsum_{i=0}^n X^2=Q/N-2X^2+X^2=Q/N-X^2=Q/N-(S/N)^2$
    $endgroup$
    – dennyrolling
    Apr 26 '17 at 20:52


















  • $begingroup$
    This is the best way to achieve the goal - single pass calculate with fixed amount of storage required. But is it mathematically valid?
    $endgroup$
    – Edward Ross
    Aug 25 '16 at 13:12










  • $begingroup$
    indeed (let $X=S/N$ be mean): $s^2=1/Nsum_{i=0}^n (x_i-X)^2 = 1/Nsum_{i=0}^n (x_i^2 - 2x_iX+X^2)=1/Nsum_{i=0}^n x_i^2-1/Nsum_{i=0}^n 2x_iX+1/Nsum_{i=0}^n X^2=Q/N-2X^2+X^2=Q/N-X^2=Q/N-(S/N)^2$
    $endgroup$
    – dennyrolling
    Apr 26 '17 at 20:52
















$begingroup$
This is the best way to achieve the goal - single pass calculate with fixed amount of storage required. But is it mathematically valid?
$endgroup$
– Edward Ross
Aug 25 '16 at 13:12




$begingroup$
This is the best way to achieve the goal - single pass calculate with fixed amount of storage required. But is it mathematically valid?
$endgroup$
– Edward Ross
Aug 25 '16 at 13:12












$begingroup$
indeed (let $X=S/N$ be mean): $s^2=1/Nsum_{i=0}^n (x_i-X)^2 = 1/Nsum_{i=0}^n (x_i^2 - 2x_iX+X^2)=1/Nsum_{i=0}^n x_i^2-1/Nsum_{i=0}^n 2x_iX+1/Nsum_{i=0}^n X^2=Q/N-2X^2+X^2=Q/N-X^2=Q/N-(S/N)^2$
$endgroup$
– dennyrolling
Apr 26 '17 at 20:52




$begingroup$
indeed (let $X=S/N$ be mean): $s^2=1/Nsum_{i=0}^n (x_i-X)^2 = 1/Nsum_{i=0}^n (x_i^2 - 2x_iX+X^2)=1/Nsum_{i=0}^n x_i^2-1/Nsum_{i=0}^n 2x_iX+1/Nsum_{i=0}^n X^2=Q/N-2X^2+X^2=Q/N-X^2=Q/N-(S/N)^2$
$endgroup$
– dennyrolling
Apr 26 '17 at 20:52











4












$begingroup$

Forgive my poor math background, what I need is detail!



I added my progress here for someone like me.



$$
s_n^2=frac {sum_{i=1}^{n}(x_i-bar{x}_n)^2}{n-1} \
= frac {sum_{i=1}^n(x_i - bar{x}_{n-1} + bar{x}_{n-1} - bar{x}_n)^2}{n-1} \
= frac {sum_{i=1}^{n}(x_i - bar{x}_{n-1})^2 + 2sum_{i=1}^n(x_i - bar{x}_{n-1})(bar{x}_{n-1} - bar{x}_n) + sum_{i=1}^n(bar{x}_{n-1} - bar{x}_n)^2} {n-1} \
= frac {(sum_{i=1}^{n-1}(x_i - bar{x}_{n-1})^2 + (x_n - bar{x}_{n-1})^2) + (2sum_{i=1}^{n-1}(x_i - bar{x}_{n-1})(bar{x}_{n-1} - bar{x}_n) + 2(x_n - bar{x}_{n-1})(bar{x}_{n-1} - bar{x}_n)) + sum_{i=1}^n(bar{x}_{n-1} - bar{x}_n)^2} {n-1} \
= frac {(n-2)s_{n-1}^2 + (x_n - bar{x}_{n-1})^2 + 0 + 2(x_n - bar{x}_{n-1})(bar{x}_{n-1} - bar{x}_n)) + n(bar{x}_{n-1} - bar{x}_n)^2} {n-1} \
= frac {(n-2)s_{n-1}^2 + x_n^2 - 2x_nbar{x}_{n-1} + bar{x}_{n-1}^2 + 2x_n bar{x}_{n-1} - 2x_n bar{x}_n - 2bar{x}_{n-1}^2 + 2bar{x}_{n-1}bar{x}_n + nbar{x}_{n-1}^2 - 2nbar{x}_{n-1}bar{x}_n + nbar{x}_n^2} {n-1} \
= frac {(n-2)s_{n-1}^2 + x_n^2 + bar{x}_{n-1}^2 - 2x_n bar{x}_n - 2bar{x}_{n-1}^2 + 2bar{x}_{n-1}bar{x}_n + nbar{x}_{n-1}^2 - 2nbar{x}_{n-1}bar{x}_n + nbar{x}_n^2} {n-1} \
= frac {(n-2)s_{n-1}^2 + x_n^2 - 2x_nbar{x}_{n-1} + bar{x}_{n-1}^2 + 2x_n bar{x}_{n-1} - 2x_n bar{x}_n - 2bar{x}_{n-1}^2 + 2bar{x}_{n-1}bar{x}_n + nbar{x}_{n-1}^2 - 2nbar{x}_{n-1}bar{x}_n + nbar{x}_n^2} {n-1} \
= frac {(n-2)s_{n-1}^2 + (x_n^2 - 2x_nbar{x}_n + bar{x}_n^2) + (n-1)(bar{x}_{n-1}^2 - 2bar{x}_{n-1}bar{x}_n + bar{x}_n^2)} {n-1} \
= frac {(n-2)s_{n-1}^2 + (n-1)(bar{x}_{n-1} - bar{x}_n)^2 + (x_n - bar{x}_n)^2} {n-1} \
= frac {n-2}{n-1}s_{n-1}^2 + (bar{x}_{n-1} - bar{x}_n)^2 + frac {(x_n - bar{x}_n)^2}{n-1}
$$



and



$$
(bar{x}_{n-1} - bar{x}_n)^2 + frac {(x_n - bar{x}_n)^2}{n-1} \
= (bar{x}_{n-1} - frac {x_n + (n-1)bar{x}_{n-1}}{n})^2 + frac {(x_n - frac {x_n + (n-1)bar{x}_{n-1}}{n})^2}{n-1} \
= frac {1}{n} (x_n - bar{x}_{n-1})^2
$$



so



$$
s_n^2 = frac{n-2}{n-1}s_{n-1}^2 + frac{1}{n}(x_n - bar{x}_{n-1})^2
$$






share|cite|improve this answer









$endgroup$













  • $begingroup$
    Awesome! Thanks!
    $endgroup$
    – user3019105
    Feb 24 '18 at 18:33
















4












$begingroup$

Forgive my poor math background, what I need is detail!



I added my progress here for someone like me.



$$
s_n^2=frac {sum_{i=1}^{n}(x_i-bar{x}_n)^2}{n-1} \
= frac {sum_{i=1}^n(x_i - bar{x}_{n-1} + bar{x}_{n-1} - bar{x}_n)^2}{n-1} \
= frac {sum_{i=1}^{n}(x_i - bar{x}_{n-1})^2 + 2sum_{i=1}^n(x_i - bar{x}_{n-1})(bar{x}_{n-1} - bar{x}_n) + sum_{i=1}^n(bar{x}_{n-1} - bar{x}_n)^2} {n-1} \
= frac {(sum_{i=1}^{n-1}(x_i - bar{x}_{n-1})^2 + (x_n - bar{x}_{n-1})^2) + (2sum_{i=1}^{n-1}(x_i - bar{x}_{n-1})(bar{x}_{n-1} - bar{x}_n) + 2(x_n - bar{x}_{n-1})(bar{x}_{n-1} - bar{x}_n)) + sum_{i=1}^n(bar{x}_{n-1} - bar{x}_n)^2} {n-1} \
= frac {(n-2)s_{n-1}^2 + (x_n - bar{x}_{n-1})^2 + 0 + 2(x_n - bar{x}_{n-1})(bar{x}_{n-1} - bar{x}_n)) + n(bar{x}_{n-1} - bar{x}_n)^2} {n-1} \
= frac {(n-2)s_{n-1}^2 + x_n^2 - 2x_nbar{x}_{n-1} + bar{x}_{n-1}^2 + 2x_n bar{x}_{n-1} - 2x_n bar{x}_n - 2bar{x}_{n-1}^2 + 2bar{x}_{n-1}bar{x}_n + nbar{x}_{n-1}^2 - 2nbar{x}_{n-1}bar{x}_n + nbar{x}_n^2} {n-1} \
= frac {(n-2)s_{n-1}^2 + x_n^2 + bar{x}_{n-1}^2 - 2x_n bar{x}_n - 2bar{x}_{n-1}^2 + 2bar{x}_{n-1}bar{x}_n + nbar{x}_{n-1}^2 - 2nbar{x}_{n-1}bar{x}_n + nbar{x}_n^2} {n-1} \
= frac {(n-2)s_{n-1}^2 + x_n^2 - 2x_nbar{x}_{n-1} + bar{x}_{n-1}^2 + 2x_n bar{x}_{n-1} - 2x_n bar{x}_n - 2bar{x}_{n-1}^2 + 2bar{x}_{n-1}bar{x}_n + nbar{x}_{n-1}^2 - 2nbar{x}_{n-1}bar{x}_n + nbar{x}_n^2} {n-1} \
= frac {(n-2)s_{n-1}^2 + (x_n^2 - 2x_nbar{x}_n + bar{x}_n^2) + (n-1)(bar{x}_{n-1}^2 - 2bar{x}_{n-1}bar{x}_n + bar{x}_n^2)} {n-1} \
= frac {(n-2)s_{n-1}^2 + (n-1)(bar{x}_{n-1} - bar{x}_n)^2 + (x_n - bar{x}_n)^2} {n-1} \
= frac {n-2}{n-1}s_{n-1}^2 + (bar{x}_{n-1} - bar{x}_n)^2 + frac {(x_n - bar{x}_n)^2}{n-1}
$$



and



$$
(bar{x}_{n-1} - bar{x}_n)^2 + frac {(x_n - bar{x}_n)^2}{n-1} \
= (bar{x}_{n-1} - frac {x_n + (n-1)bar{x}_{n-1}}{n})^2 + frac {(x_n - frac {x_n + (n-1)bar{x}_{n-1}}{n})^2}{n-1} \
= frac {1}{n} (x_n - bar{x}_{n-1})^2
$$



so



$$
s_n^2 = frac{n-2}{n-1}s_{n-1}^2 + frac{1}{n}(x_n - bar{x}_{n-1})^2
$$






share|cite|improve this answer









$endgroup$













  • $begingroup$
    Awesome! Thanks!
    $endgroup$
    – user3019105
    Feb 24 '18 at 18:33














4












4








4





$begingroup$

Forgive my poor math background, what I need is detail!



I added my progress here for someone like me.



$$
s_n^2=frac {sum_{i=1}^{n}(x_i-bar{x}_n)^2}{n-1} \
= frac {sum_{i=1}^n(x_i - bar{x}_{n-1} + bar{x}_{n-1} - bar{x}_n)^2}{n-1} \
= frac {sum_{i=1}^{n}(x_i - bar{x}_{n-1})^2 + 2sum_{i=1}^n(x_i - bar{x}_{n-1})(bar{x}_{n-1} - bar{x}_n) + sum_{i=1}^n(bar{x}_{n-1} - bar{x}_n)^2} {n-1} \
= frac {(sum_{i=1}^{n-1}(x_i - bar{x}_{n-1})^2 + (x_n - bar{x}_{n-1})^2) + (2sum_{i=1}^{n-1}(x_i - bar{x}_{n-1})(bar{x}_{n-1} - bar{x}_n) + 2(x_n - bar{x}_{n-1})(bar{x}_{n-1} - bar{x}_n)) + sum_{i=1}^n(bar{x}_{n-1} - bar{x}_n)^2} {n-1} \
= frac {(n-2)s_{n-1}^2 + (x_n - bar{x}_{n-1})^2 + 0 + 2(x_n - bar{x}_{n-1})(bar{x}_{n-1} - bar{x}_n)) + n(bar{x}_{n-1} - bar{x}_n)^2} {n-1} \
= frac {(n-2)s_{n-1}^2 + x_n^2 - 2x_nbar{x}_{n-1} + bar{x}_{n-1}^2 + 2x_n bar{x}_{n-1} - 2x_n bar{x}_n - 2bar{x}_{n-1}^2 + 2bar{x}_{n-1}bar{x}_n + nbar{x}_{n-1}^2 - 2nbar{x}_{n-1}bar{x}_n + nbar{x}_n^2} {n-1} \
= frac {(n-2)s_{n-1}^2 + x_n^2 + bar{x}_{n-1}^2 - 2x_n bar{x}_n - 2bar{x}_{n-1}^2 + 2bar{x}_{n-1}bar{x}_n + nbar{x}_{n-1}^2 - 2nbar{x}_{n-1}bar{x}_n + nbar{x}_n^2} {n-1} \
= frac {(n-2)s_{n-1}^2 + x_n^2 - 2x_nbar{x}_{n-1} + bar{x}_{n-1}^2 + 2x_n bar{x}_{n-1} - 2x_n bar{x}_n - 2bar{x}_{n-1}^2 + 2bar{x}_{n-1}bar{x}_n + nbar{x}_{n-1}^2 - 2nbar{x}_{n-1}bar{x}_n + nbar{x}_n^2} {n-1} \
= frac {(n-2)s_{n-1}^2 + (x_n^2 - 2x_nbar{x}_n + bar{x}_n^2) + (n-1)(bar{x}_{n-1}^2 - 2bar{x}_{n-1}bar{x}_n + bar{x}_n^2)} {n-1} \
= frac {(n-2)s_{n-1}^2 + (n-1)(bar{x}_{n-1} - bar{x}_n)^2 + (x_n - bar{x}_n)^2} {n-1} \
= frac {n-2}{n-1}s_{n-1}^2 + (bar{x}_{n-1} - bar{x}_n)^2 + frac {(x_n - bar{x}_n)^2}{n-1}
$$



and



$$
(bar{x}_{n-1} - bar{x}_n)^2 + frac {(x_n - bar{x}_n)^2}{n-1} \
= (bar{x}_{n-1} - frac {x_n + (n-1)bar{x}_{n-1}}{n})^2 + frac {(x_n - frac {x_n + (n-1)bar{x}_{n-1}}{n})^2}{n-1} \
= frac {1}{n} (x_n - bar{x}_{n-1})^2
$$



so



$$
s_n^2 = frac{n-2}{n-1}s_{n-1}^2 + frac{1}{n}(x_n - bar{x}_{n-1})^2
$$






share|cite|improve this answer









$endgroup$



Forgive my poor math background, what I need is detail!



I added my progress here for someone like me.



$$
s_n^2=frac {sum_{i=1}^{n}(x_i-bar{x}_n)^2}{n-1} \
= frac {sum_{i=1}^n(x_i - bar{x}_{n-1} + bar{x}_{n-1} - bar{x}_n)^2}{n-1} \
= frac {sum_{i=1}^{n}(x_i - bar{x}_{n-1})^2 + 2sum_{i=1}^n(x_i - bar{x}_{n-1})(bar{x}_{n-1} - bar{x}_n) + sum_{i=1}^n(bar{x}_{n-1} - bar{x}_n)^2} {n-1} \
= frac {(sum_{i=1}^{n-1}(x_i - bar{x}_{n-1})^2 + (x_n - bar{x}_{n-1})^2) + (2sum_{i=1}^{n-1}(x_i - bar{x}_{n-1})(bar{x}_{n-1} - bar{x}_n) + 2(x_n - bar{x}_{n-1})(bar{x}_{n-1} - bar{x}_n)) + sum_{i=1}^n(bar{x}_{n-1} - bar{x}_n)^2} {n-1} \
= frac {(n-2)s_{n-1}^2 + (x_n - bar{x}_{n-1})^2 + 0 + 2(x_n - bar{x}_{n-1})(bar{x}_{n-1} - bar{x}_n)) + n(bar{x}_{n-1} - bar{x}_n)^2} {n-1} \
= frac {(n-2)s_{n-1}^2 + x_n^2 - 2x_nbar{x}_{n-1} + bar{x}_{n-1}^2 + 2x_n bar{x}_{n-1} - 2x_n bar{x}_n - 2bar{x}_{n-1}^2 + 2bar{x}_{n-1}bar{x}_n + nbar{x}_{n-1}^2 - 2nbar{x}_{n-1}bar{x}_n + nbar{x}_n^2} {n-1} \
= frac {(n-2)s_{n-1}^2 + x_n^2 + bar{x}_{n-1}^2 - 2x_n bar{x}_n - 2bar{x}_{n-1}^2 + 2bar{x}_{n-1}bar{x}_n + nbar{x}_{n-1}^2 - 2nbar{x}_{n-1}bar{x}_n + nbar{x}_n^2} {n-1} \
= frac {(n-2)s_{n-1}^2 + x_n^2 - 2x_nbar{x}_{n-1} + bar{x}_{n-1}^2 + 2x_n bar{x}_{n-1} - 2x_n bar{x}_n - 2bar{x}_{n-1}^2 + 2bar{x}_{n-1}bar{x}_n + nbar{x}_{n-1}^2 - 2nbar{x}_{n-1}bar{x}_n + nbar{x}_n^2} {n-1} \
= frac {(n-2)s_{n-1}^2 + (x_n^2 - 2x_nbar{x}_n + bar{x}_n^2) + (n-1)(bar{x}_{n-1}^2 - 2bar{x}_{n-1}bar{x}_n + bar{x}_n^2)} {n-1} \
= frac {(n-2)s_{n-1}^2 + (n-1)(bar{x}_{n-1} - bar{x}_n)^2 + (x_n - bar{x}_n)^2} {n-1} \
= frac {n-2}{n-1}s_{n-1}^2 + (bar{x}_{n-1} - bar{x}_n)^2 + frac {(x_n - bar{x}_n)^2}{n-1}
$$



and



$$
(bar{x}_{n-1} - bar{x}_n)^2 + frac {(x_n - bar{x}_n)^2}{n-1} \
= (bar{x}_{n-1} - frac {x_n + (n-1)bar{x}_{n-1}}{n})^2 + frac {(x_n - frac {x_n + (n-1)bar{x}_{n-1}}{n})^2}{n-1} \
= frac {1}{n} (x_n - bar{x}_{n-1})^2
$$



so



$$
s_n^2 = frac{n-2}{n-1}s_{n-1}^2 + frac{1}{n}(x_n - bar{x}_{n-1})^2
$$







share|cite|improve this answer












share|cite|improve this answer



share|cite|improve this answer










answered Sep 2 '17 at 11:38









secfreesecfree

1412




1412












  • $begingroup$
    Awesome! Thanks!
    $endgroup$
    – user3019105
    Feb 24 '18 at 18:33


















  • $begingroup$
    Awesome! Thanks!
    $endgroup$
    – user3019105
    Feb 24 '18 at 18:33
















$begingroup$
Awesome! Thanks!
$endgroup$
– user3019105
Feb 24 '18 at 18:33




$begingroup$
Awesome! Thanks!
$endgroup$
– user3019105
Feb 24 '18 at 18:33











1












$begingroup$

If it is of any help I found what seems to be a much nicer way here wikipedia - online algorithm .



The following is how I have implemented it in java. It has work very well for me. Hope it is clear enough, Please feel free to ask me questions about it.







/**
* standardDeviation() - designed to calculate the standard deviation of a data set incrementally by taking the last entered value and the previous sum of differences to the mean recorded.
* (i.e; upon adding a value to the data set this function should immediately be called)
*
* NOTE: do not call this function if the data set size it less than 2 since standard deviation cannot be calculated on a single value
* NOTE: sum_avg, sum_sd and avg are all static variables
* NOTE: on attempting to use this on another set following previous use, the static values will have to be reset**
*
* @param vector - List<Double> - data with only one additional value from previous method call
* @return updated value for the Standard deviation
*/
public static double standardDeviation(List<Double> vector)
{
double N = (double) vector.size(); //size of the data set
double oldavg = avg; //save the old average
avg = updateAverage(vector); //update the new average

if(N==2.0) //if there are only two, we calculate the standard deviation using the standard formula
{
for(double d:vector) //cycle through the 2 elements of the data set - there is no need to use a loop here, the set is quite small to just do manually
{
sum_sd += Math.pow((Math.abs(d)-avg), 2); //sum the following according to the formula
}
}
else if(N>2) //once we have calculated the base sum_std
{
double newel = (vector.get(vector.size()-1)); //get the latest addition to the data set

sum_sd = sum_sd + (newel - oldavg)*(newel-avg); //https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Online_algorithm
}
return Math.sqrt((sum_sd)/(N)); //N or N-1 depends on your choice of sample of population standard deviation

}

/**
* simplistic method for incrementally calculating the mean of a data set
*
* @param vector - List<Double> - data with only one additional value from previous method call
* @return updated value for the mean of the given data set
*/
public static double updateAverage(List<Double> vector)
{
if(vector.size()==2){
sum_avg = vector.get(vector.size()-1) + vector.get(vector.size()-2);
}
else{
sum_avg += vector.get(vector.size()-1);
}


return sum_avg/(double)vector.size();

}





share|cite|improve this answer











$endgroup$













  • $begingroup$
    This points to exactly same source as the the answer by @oldrinb two years earlier. The good thing here is that this is self-contained while the older one is just a link (with a keyword reference).
    $endgroup$
    – Lee David Chung Lin
    Feb 14 at 14:25


















1












$begingroup$

If it is of any help I found what seems to be a much nicer way here wikipedia - online algorithm .



The following is how I have implemented it in java. It has work very well for me. Hope it is clear enough, Please feel free to ask me questions about it.







/**
* standardDeviation() - designed to calculate the standard deviation of a data set incrementally by taking the last entered value and the previous sum of differences to the mean recorded.
* (i.e; upon adding a value to the data set this function should immediately be called)
*
* NOTE: do not call this function if the data set size it less than 2 since standard deviation cannot be calculated on a single value
* NOTE: sum_avg, sum_sd and avg are all static variables
* NOTE: on attempting to use this on another set following previous use, the static values will have to be reset**
*
* @param vector - List<Double> - data with only one additional value from previous method call
* @return updated value for the Standard deviation
*/
public static double standardDeviation(List<Double> vector)
{
double N = (double) vector.size(); //size of the data set
double oldavg = avg; //save the old average
avg = updateAverage(vector); //update the new average

if(N==2.0) //if there are only two, we calculate the standard deviation using the standard formula
{
for(double d:vector) //cycle through the 2 elements of the data set - there is no need to use a loop here, the set is quite small to just do manually
{
sum_sd += Math.pow((Math.abs(d)-avg), 2); //sum the following according to the formula
}
}
else if(N>2) //once we have calculated the base sum_std
{
double newel = (vector.get(vector.size()-1)); //get the latest addition to the data set

sum_sd = sum_sd + (newel - oldavg)*(newel-avg); //https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Online_algorithm
}
return Math.sqrt((sum_sd)/(N)); //N or N-1 depends on your choice of sample of population standard deviation

}

/**
* simplistic method for incrementally calculating the mean of a data set
*
* @param vector - List<Double> - data with only one additional value from previous method call
* @return updated value for the mean of the given data set
*/
public static double updateAverage(List<Double> vector)
{
if(vector.size()==2){
sum_avg = vector.get(vector.size()-1) + vector.get(vector.size()-2);
}
else{
sum_avg += vector.get(vector.size()-1);
}


return sum_avg/(double)vector.size();

}





share|cite|improve this answer











$endgroup$













  • $begingroup$
    This points to exactly same source as the the answer by @oldrinb two years earlier. The good thing here is that this is self-contained while the older one is just a link (with a keyword reference).
    $endgroup$
    – Lee David Chung Lin
    Feb 14 at 14:25
















1












1








1





$begingroup$

If it is of any help I found what seems to be a much nicer way here wikipedia - online algorithm .



The following is how I have implemented it in java. It has work very well for me. Hope it is clear enough, Please feel free to ask me questions about it.







/**
* standardDeviation() - designed to calculate the standard deviation of a data set incrementally by taking the last entered value and the previous sum of differences to the mean recorded.
* (i.e; upon adding a value to the data set this function should immediately be called)
*
* NOTE: do not call this function if the data set size it less than 2 since standard deviation cannot be calculated on a single value
* NOTE: sum_avg, sum_sd and avg are all static variables
* NOTE: on attempting to use this on another set following previous use, the static values will have to be reset**
*
* @param vector - List<Double> - data with only one additional value from previous method call
* @return updated value for the Standard deviation
*/
public static double standardDeviation(List<Double> vector)
{
double N = (double) vector.size(); //size of the data set
double oldavg = avg; //save the old average
avg = updateAverage(vector); //update the new average

if(N==2.0) //if there are only two, we calculate the standard deviation using the standard formula
{
for(double d:vector) //cycle through the 2 elements of the data set - there is no need to use a loop here, the set is quite small to just do manually
{
sum_sd += Math.pow((Math.abs(d)-avg), 2); //sum the following according to the formula
}
}
else if(N>2) //once we have calculated the base sum_std
{
double newel = (vector.get(vector.size()-1)); //get the latest addition to the data set

sum_sd = sum_sd + (newel - oldavg)*(newel-avg); //https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Online_algorithm
}
return Math.sqrt((sum_sd)/(N)); //N or N-1 depends on your choice of sample of population standard deviation

}

/**
* simplistic method for incrementally calculating the mean of a data set
*
* @param vector - List<Double> - data with only one additional value from previous method call
* @return updated value for the mean of the given data set
*/
public static double updateAverage(List<Double> vector)
{
if(vector.size()==2){
sum_avg = vector.get(vector.size()-1) + vector.get(vector.size()-2);
}
else{
sum_avg += vector.get(vector.size()-1);
}


return sum_avg/(double)vector.size();

}





share|cite|improve this answer











$endgroup$



If it is of any help I found what seems to be a much nicer way here wikipedia - online algorithm .



The following is how I have implemented it in java. It has work very well for me. Hope it is clear enough, Please feel free to ask me questions about it.







/**
* standardDeviation() - designed to calculate the standard deviation of a data set incrementally by taking the last entered value and the previous sum of differences to the mean recorded.
* (i.e; upon adding a value to the data set this function should immediately be called)
*
* NOTE: do not call this function if the data set size it less than 2 since standard deviation cannot be calculated on a single value
* NOTE: sum_avg, sum_sd and avg are all static variables
* NOTE: on attempting to use this on another set following previous use, the static values will have to be reset**
*
* @param vector - List<Double> - data with only one additional value from previous method call
* @return updated value for the Standard deviation
*/
public static double standardDeviation(List<Double> vector)
{
double N = (double) vector.size(); //size of the data set
double oldavg = avg; //save the old average
avg = updateAverage(vector); //update the new average

if(N==2.0) //if there are only two, we calculate the standard deviation using the standard formula
{
for(double d:vector) //cycle through the 2 elements of the data set - there is no need to use a loop here, the set is quite small to just do manually
{
sum_sd += Math.pow((Math.abs(d)-avg), 2); //sum the following according to the formula
}
}
else if(N>2) //once we have calculated the base sum_std
{
double newel = (vector.get(vector.size()-1)); //get the latest addition to the data set

sum_sd = sum_sd + (newel - oldavg)*(newel-avg); //https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Online_algorithm
}
return Math.sqrt((sum_sd)/(N)); //N or N-1 depends on your choice of sample of population standard deviation

}

/**
* simplistic method for incrementally calculating the mean of a data set
*
* @param vector - List<Double> - data with only one additional value from previous method call
* @return updated value for the mean of the given data set
*/
public static double updateAverage(List<Double> vector)
{
if(vector.size()==2){
sum_avg = vector.get(vector.size()-1) + vector.get(vector.size()-2);
}
else{
sum_avg += vector.get(vector.size()-1);
}


return sum_avg/(double)vector.size();

}






share|cite|improve this answer














share|cite|improve this answer



share|cite|improve this answer








edited Jul 12 '17 at 12:12

























answered Jul 12 '17 at 12:02









KevinCorcorKevinCorcor

112




112












  • $begingroup$
    This points to exactly same source as the the answer by @oldrinb two years earlier. The good thing here is that this is self-contained while the older one is just a link (with a keyword reference).
    $endgroup$
    – Lee David Chung Lin
    Feb 14 at 14:25




















  • $begingroup$
    This points to exactly same source as the the answer by @oldrinb two years earlier. The good thing here is that this is self-contained while the older one is just a link (with a keyword reference).
    $endgroup$
    – Lee David Chung Lin
    Feb 14 at 14:25


















$begingroup$
This points to exactly same source as the the answer by @oldrinb two years earlier. The good thing here is that this is self-contained while the older one is just a link (with a keyword reference).
$endgroup$
– Lee David Chung Lin
Feb 14 at 14:25






$begingroup$
This points to exactly same source as the the answer by @oldrinb two years earlier. The good thing here is that this is self-contained while the older one is just a link (with a keyword reference).
$endgroup$
– Lee David Chung Lin
Feb 14 at 14:25




















draft saved

draft discarded




















































Thanks for contributing an answer to Mathematics Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f102978%2fincremental-computation-of-standard-deviation%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

MongoDB - Not Authorized To Execute Command

How to fix TextFormField cause rebuild widget in Flutter

in spring boot 2.1 many test slices are not allowed anymore due to multiple @BootstrapWith