Resetting outliers in a timeseries dataframe to 3 SD

Domain: Python & Pandas

I have a time series data frame which has the total number of customers for each day for the last 10 years.

The columns are:

date

total customers

There are outliers in my total customers column.

I wanted to reset the outliers outside of 3 standard deviations above the mean to a value as defined by the formula below.

Outlier which is above 3SD = Mean + 3 S.D.

edited Nov 21 '18 at 21:51

asked Nov 21 '18 at 21:39

zosh

267

add a comment |

Domain: Python & Pandas

I have a time series data frame which has the total number of customers for each day for the last 10 years.

The columns are:

date

total customers

There are outliers in my total customers column.

I wanted to reset the outliers outside of 3 standard deviations above the mean to a value as defined by the formula below.

Outlier which is above 3SD = Mean + 3 S.D.

edited Nov 21 '18 at 21:51

asked Nov 21 '18 at 21:39

zosh

267

add a comment |

Domain: Python & Pandas

I have a time series data frame which has the total number of customers for each day for the last 10 years.

The columns are:

date

total customers

There are outliers in my total customers column.

I wanted to reset the outliers outside of 3 standard deviations above the mean to a value as defined by the formula below.

Outlier which is above 3SD = Mean + 3 S.D.

edited Nov 21 '18 at 21:51

asked Nov 21 '18 at 21:39

zosh

267

Domain: Python & Pandas

I have a time series data frame which has the total number of customers for each day for the last 10 years.

The columns are:

date

total customers

There are outliers in my total customers column.

I wanted to reset the outliers outside of 3 standard deviations above the mean to a value as defined by the formula below.

Outlier which is above 3SD = Mean + 3 S.D.

python dataframe statistics

edited Nov 21 '18 at 21:51

asked Nov 21 '18 at 21:39

zosh

267

edited Nov 21 '18 at 21:51

asked Nov 21 '18 at 21:39

zosh

267

edited Nov 21 '18 at 21:51

asked Nov 21 '18 at 21:39

zosh

267

asked Nov 21 '18 at 21:39

zosh

267

asked Nov 21 '18 at 21:39

zosh

267

add a comment |

1 Answer
1

active

oldest

votes

You could use the .clip_upper() method to limit values in the customers column to mean+3*sd.

m = df['total customers'].mean()

sd = df['total customers'].std()

df['total customers'] = df['total_customers'].clip_upper(m + 3*sd)

Here's the documentation for clip_upper.

answered Nov 21 '18 at 21:44

Craig

2,1961819

Thank you so much for your reply

– zosh
Nov 21 '18 at 21:52

1

This function does exactly what you are asking for. It replaces any values that exceed the 'clip' value with the 'clip' value. It does not remove anything.

– Craig
Nov 21 '18 at 21:54

got it thank you so much

– zosh
Nov 21 '18 at 21:55

Hey Craig, sorry to bother you again: What if I wanted to completely remove all the rows with outliers?

– zosh
Nov 21 '18 at 22:13

@zosh - That's a new question, but the answer is to use boolean indexing as described in stackoverflow.com/a/23200666/7517724

– Craig
Nov 22 '18 at 0:13

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53420838%2fresetting-outliers-in-a-timeseries-dataframe-to-3-sd%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

You could use the .clip_upper() method to limit values in the customers column to mean+3*sd.

m = df['total customers'].mean()

sd = df['total customers'].std()

df['total customers'] = df['total_customers'].clip_upper(m + 3*sd)

Here's the documentation for clip_upper.

answered Nov 21 '18 at 21:44

Craig

2,1961819

Thank you so much for your reply

– zosh
Nov 21 '18 at 21:52

1

This function does exactly what you are asking for. It replaces any values that exceed the 'clip' value with the 'clip' value. It does not remove anything.

– Craig
Nov 21 '18 at 21:54

got it thank you so much

– zosh
Nov 21 '18 at 21:55

Hey Craig, sorry to bother you again: What if I wanted to completely remove all the rows with outliers?

– zosh
Nov 21 '18 at 22:13

@zosh - That's a new question, but the answer is to use boolean indexing as described in stackoverflow.com/a/23200666/7517724

– Craig
Nov 22 '18 at 0:13

add a comment |

You could use the .clip_upper() method to limit values in the customers column to mean+3*sd.

m = df['total customers'].mean()

sd = df['total customers'].std()

df['total customers'] = df['total_customers'].clip_upper(m + 3*sd)

Here's the documentation for clip_upper.

answered Nov 21 '18 at 21:44

Craig

2,1961819

Thank you so much for your reply

– zosh
Nov 21 '18 at 21:52

1

This function does exactly what you are asking for. It replaces any values that exceed the 'clip' value with the 'clip' value. It does not remove anything.

– Craig
Nov 21 '18 at 21:54

got it thank you so much

– zosh
Nov 21 '18 at 21:55

Hey Craig, sorry to bother you again: What if I wanted to completely remove all the rows with outliers?

– zosh
Nov 21 '18 at 22:13

@zosh - That's a new question, but the answer is to use boolean indexing as described in stackoverflow.com/a/23200666/7517724

– Craig
Nov 22 '18 at 0:13

add a comment |

You could use the .clip_upper() method to limit values in the customers column to mean+3*sd.

m = df['total customers'].mean()

sd = df['total customers'].std()

df['total customers'] = df['total_customers'].clip_upper(m + 3*sd)

Here's the documentation for clip_upper.

answered Nov 21 '18 at 21:44

Craig

2,1961819

You could use the .clip_upper() method to limit values in the customers column to mean+3*sd.

m = df['total customers'].mean()

sd = df['total customers'].std()

df['total customers'] = df['total_customers'].clip_upper(m + 3*sd)

Here's the documentation for clip_upper.

answered Nov 21 '18 at 21:44

Craig

2,1961819

answered Nov 21 '18 at 21:44

Craig

2,1961819

answered Nov 21 '18 at 21:44

Craig

2,1961819

answered Nov 21 '18 at 21:44

Craig

2,1961819

Thank you so much for your reply

– zosh
Nov 21 '18 at 21:52

1

This function does exactly what you are asking for. It replaces any values that exceed the 'clip' value with the 'clip' value. It does not remove anything.

– Craig
Nov 21 '18 at 21:54

got it thank you so much

– zosh
Nov 21 '18 at 21:55

Hey Craig, sorry to bother you again: What if I wanted to completely remove all the rows with outliers?

– zosh
Nov 21 '18 at 22:13

@zosh - That's a new question, but the answer is to use boolean indexing as described in stackoverflow.com/a/23200666/7517724

– Craig
Nov 22 '18 at 0:13

add a comment |

Thank you so much for your reply

– zosh
Nov 21 '18 at 21:52

1

This function does exactly what you are asking for. It replaces any values that exceed the 'clip' value with the 'clip' value. It does not remove anything.

– Craig
Nov 21 '18 at 21:54

got it thank you so much

– zosh
Nov 21 '18 at 21:55

Hey Craig, sorry to bother you again: What if I wanted to completely remove all the rows with outliers?

– zosh
Nov 21 '18 at 22:13

@zosh - That's a new question, but the answer is to use boolean indexing as described in stackoverflow.com/a/23200666/7517724

– Craig
Nov 22 '18 at 0:13

Thank you so much for your reply

– zosh
Nov 21 '18 at 21:52

This function does exactly what you are asking for. It replaces any values that exceed the 'clip' value with the 'clip' value. It does not remove anything.

– Craig
Nov 21 '18 at 21:54

got it thank you so much

– zosh
Nov 21 '18 at 21:55

Hey Craig, sorry to bother you again: What if I wanted to completely remove all the rows with outliers?

– zosh
Nov 21 '18 at 22:13

@zosh - That's a new question, but the answer is to use boolean indexing as described in stackoverflow.com/a/23200666/7517724

– Craig
Nov 22 '18 at 0:13

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

Search This Blog

Ufyukyu