Resetting outliers in a timeseries dataframe to 3 SD












0















Domain: Python & Pandas



I have a time series data frame which has the total number of customers for each day for the last 10 years.



The columns are:




  • date

  • total customers


There are outliers in my total customers column.



I wanted to reset the outliers outside of 3 standard deviations above the mean to a value as defined by the formula below.



Outlier which is above 3SD = Mean + 3 S.D.










share|improve this question





























    0















    Domain: Python & Pandas



    I have a time series data frame which has the total number of customers for each day for the last 10 years.



    The columns are:




    • date

    • total customers


    There are outliers in my total customers column.



    I wanted to reset the outliers outside of 3 standard deviations above the mean to a value as defined by the formula below.



    Outlier which is above 3SD = Mean + 3 S.D.










    share|improve this question



























      0












      0








      0








      Domain: Python & Pandas



      I have a time series data frame which has the total number of customers for each day for the last 10 years.



      The columns are:




      • date

      • total customers


      There are outliers in my total customers column.



      I wanted to reset the outliers outside of 3 standard deviations above the mean to a value as defined by the formula below.



      Outlier which is above 3SD = Mean + 3 S.D.










      share|improve this question
















      Domain: Python & Pandas



      I have a time series data frame which has the total number of customers for each day for the last 10 years.



      The columns are:




      • date

      • total customers


      There are outliers in my total customers column.



      I wanted to reset the outliers outside of 3 standard deviations above the mean to a value as defined by the formula below.



      Outlier which is above 3SD = Mean + 3 S.D.







      python dataframe statistics






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 21 '18 at 21:51







      zosh

















      asked Nov 21 '18 at 21:39









      zoshzosh

      267




      267
























          1 Answer
          1






          active

          oldest

          votes


















          1














          You could use the .clip_upper() method to limit values in the customers column to mean+3*sd.



          m = df['total customers'].mean()
          sd = df['total customers'].std()
          df['total customers'] = df['total_customers'].clip_upper(m + 3*sd)


          Here's the documentation for clip_upper.






          share|improve this answer
























          • Thank you so much for your reply

            – zosh
            Nov 21 '18 at 21:52








          • 1





            This function does exactly what you are asking for. It replaces any values that exceed the 'clip' value with the 'clip' value. It does not remove anything.

            – Craig
            Nov 21 '18 at 21:54













          • got it thank you so much

            – zosh
            Nov 21 '18 at 21:55











          • Hey Craig, sorry to bother you again: What if I wanted to completely remove all the rows with outliers?

            – zosh
            Nov 21 '18 at 22:13











          • @zosh - That's a new question, but the answer is to use boolean indexing as described in stackoverflow.com/a/23200666/7517724

            – Craig
            Nov 22 '18 at 0:13











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53420838%2fresetting-outliers-in-a-timeseries-dataframe-to-3-sd%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          1














          You could use the .clip_upper() method to limit values in the customers column to mean+3*sd.



          m = df['total customers'].mean()
          sd = df['total customers'].std()
          df['total customers'] = df['total_customers'].clip_upper(m + 3*sd)


          Here's the documentation for clip_upper.






          share|improve this answer
























          • Thank you so much for your reply

            – zosh
            Nov 21 '18 at 21:52








          • 1





            This function does exactly what you are asking for. It replaces any values that exceed the 'clip' value with the 'clip' value. It does not remove anything.

            – Craig
            Nov 21 '18 at 21:54













          • got it thank you so much

            – zosh
            Nov 21 '18 at 21:55











          • Hey Craig, sorry to bother you again: What if I wanted to completely remove all the rows with outliers?

            – zosh
            Nov 21 '18 at 22:13











          • @zosh - That's a new question, but the answer is to use boolean indexing as described in stackoverflow.com/a/23200666/7517724

            – Craig
            Nov 22 '18 at 0:13
















          1














          You could use the .clip_upper() method to limit values in the customers column to mean+3*sd.



          m = df['total customers'].mean()
          sd = df['total customers'].std()
          df['total customers'] = df['total_customers'].clip_upper(m + 3*sd)


          Here's the documentation for clip_upper.






          share|improve this answer
























          • Thank you so much for your reply

            – zosh
            Nov 21 '18 at 21:52








          • 1





            This function does exactly what you are asking for. It replaces any values that exceed the 'clip' value with the 'clip' value. It does not remove anything.

            – Craig
            Nov 21 '18 at 21:54













          • got it thank you so much

            – zosh
            Nov 21 '18 at 21:55











          • Hey Craig, sorry to bother you again: What if I wanted to completely remove all the rows with outliers?

            – zosh
            Nov 21 '18 at 22:13











          • @zosh - That's a new question, but the answer is to use boolean indexing as described in stackoverflow.com/a/23200666/7517724

            – Craig
            Nov 22 '18 at 0:13














          1












          1








          1







          You could use the .clip_upper() method to limit values in the customers column to mean+3*sd.



          m = df['total customers'].mean()
          sd = df['total customers'].std()
          df['total customers'] = df['total_customers'].clip_upper(m + 3*sd)


          Here's the documentation for clip_upper.






          share|improve this answer













          You could use the .clip_upper() method to limit values in the customers column to mean+3*sd.



          m = df['total customers'].mean()
          sd = df['total customers'].std()
          df['total customers'] = df['total_customers'].clip_upper(m + 3*sd)


          Here's the documentation for clip_upper.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 21 '18 at 21:44









          CraigCraig

          2,1961819




          2,1961819













          • Thank you so much for your reply

            – zosh
            Nov 21 '18 at 21:52








          • 1





            This function does exactly what you are asking for. It replaces any values that exceed the 'clip' value with the 'clip' value. It does not remove anything.

            – Craig
            Nov 21 '18 at 21:54













          • got it thank you so much

            – zosh
            Nov 21 '18 at 21:55











          • Hey Craig, sorry to bother you again: What if I wanted to completely remove all the rows with outliers?

            – zosh
            Nov 21 '18 at 22:13











          • @zosh - That's a new question, but the answer is to use boolean indexing as described in stackoverflow.com/a/23200666/7517724

            – Craig
            Nov 22 '18 at 0:13



















          • Thank you so much for your reply

            – zosh
            Nov 21 '18 at 21:52








          • 1





            This function does exactly what you are asking for. It replaces any values that exceed the 'clip' value with the 'clip' value. It does not remove anything.

            – Craig
            Nov 21 '18 at 21:54













          • got it thank you so much

            – zosh
            Nov 21 '18 at 21:55











          • Hey Craig, sorry to bother you again: What if I wanted to completely remove all the rows with outliers?

            – zosh
            Nov 21 '18 at 22:13











          • @zosh - That's a new question, but the answer is to use boolean indexing as described in stackoverflow.com/a/23200666/7517724

            – Craig
            Nov 22 '18 at 0:13

















          Thank you so much for your reply

          – zosh
          Nov 21 '18 at 21:52







          Thank you so much for your reply

          – zosh
          Nov 21 '18 at 21:52






          1




          1





          This function does exactly what you are asking for. It replaces any values that exceed the 'clip' value with the 'clip' value. It does not remove anything.

          – Craig
          Nov 21 '18 at 21:54







          This function does exactly what you are asking for. It replaces any values that exceed the 'clip' value with the 'clip' value. It does not remove anything.

          – Craig
          Nov 21 '18 at 21:54















          got it thank you so much

          – zosh
          Nov 21 '18 at 21:55





          got it thank you so much

          – zosh
          Nov 21 '18 at 21:55













          Hey Craig, sorry to bother you again: What if I wanted to completely remove all the rows with outliers?

          – zosh
          Nov 21 '18 at 22:13





          Hey Craig, sorry to bother you again: What if I wanted to completely remove all the rows with outliers?

          – zosh
          Nov 21 '18 at 22:13













          @zosh - That's a new question, but the answer is to use boolean indexing as described in stackoverflow.com/a/23200666/7517724

          – Craig
          Nov 22 '18 at 0:13





          @zosh - That's a new question, but the answer is to use boolean indexing as described in stackoverflow.com/a/23200666/7517724

          – Craig
          Nov 22 '18 at 0:13




















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53420838%2fresetting-outliers-in-a-timeseries-dataframe-to-3-sd%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Can a sorcerer learn a 5th-level spell early by creating spell slots using the Font of Magic feature?

          ts Property 'filter' does not exist on type '{}'

          Notepad++ export/extract a list of installed plugins