Pandas - Delete Rows with only NaN values












13














I have a DataFrame containing many NaN values. I want to delete rows that contain too many NaN values; specifically: 7 or more.



I tried using the dropna function several ways but it seems clear that it greedily deletes columns or rows that contain any NaN values.



This question (Slice Pandas DataFrame by Row), shows me that if I can just compile a list of the rows that have too many NaN values, I can delete them all with a simple



df.drop(rows)


I know I can count non-null values using the count function which I could them subtract from the total and get the NaN count that way (Is there a direct way to count NaN values in a row?). But even so, I am not sure how to write a loop that goes through a DataFrame row-by-row.



Here's some pseudo-code that I think is on the right track:



### LOOP FOR ADDRESSING EACH row:
m = total - row.count()
if (m > 7):
df.drop(row)


I am still new to Pandas so I'm very open to other ways of solving this problem; whether they're simpler or more complex.










share|improve this question




















  • 1




    There is a thresh param to specify the minimum number of non-NA values: pandas.pydata.org/pandas-docs/stable/generated/… have you tried this?
    – EdChum
    Aug 5 '14 at 19:07










  • I had not noticed that, thank you. It suits my needs perfectly.
    – Slavatron
    Aug 5 '14 at 19:12






  • 1




    df.dropna(thresh=3) was all I needed (there are 9 columns in the dataframe)
    – Slavatron
    Aug 5 '14 at 19:25










  • I thought I'd put a dynamic method in my answer in the case where you don't the number of columns, glad I could help
    – EdChum
    Aug 5 '14 at 19:26
















13














I have a DataFrame containing many NaN values. I want to delete rows that contain too many NaN values; specifically: 7 or more.



I tried using the dropna function several ways but it seems clear that it greedily deletes columns or rows that contain any NaN values.



This question (Slice Pandas DataFrame by Row), shows me that if I can just compile a list of the rows that have too many NaN values, I can delete them all with a simple



df.drop(rows)


I know I can count non-null values using the count function which I could them subtract from the total and get the NaN count that way (Is there a direct way to count NaN values in a row?). But even so, I am not sure how to write a loop that goes through a DataFrame row-by-row.



Here's some pseudo-code that I think is on the right track:



### LOOP FOR ADDRESSING EACH row:
m = total - row.count()
if (m > 7):
df.drop(row)


I am still new to Pandas so I'm very open to other ways of solving this problem; whether they're simpler or more complex.










share|improve this question




















  • 1




    There is a thresh param to specify the minimum number of non-NA values: pandas.pydata.org/pandas-docs/stable/generated/… have you tried this?
    – EdChum
    Aug 5 '14 at 19:07










  • I had not noticed that, thank you. It suits my needs perfectly.
    – Slavatron
    Aug 5 '14 at 19:12






  • 1




    df.dropna(thresh=3) was all I needed (there are 9 columns in the dataframe)
    – Slavatron
    Aug 5 '14 at 19:25










  • I thought I'd put a dynamic method in my answer in the case where you don't the number of columns, glad I could help
    – EdChum
    Aug 5 '14 at 19:26














13












13








13


5





I have a DataFrame containing many NaN values. I want to delete rows that contain too many NaN values; specifically: 7 or more.



I tried using the dropna function several ways but it seems clear that it greedily deletes columns or rows that contain any NaN values.



This question (Slice Pandas DataFrame by Row), shows me that if I can just compile a list of the rows that have too many NaN values, I can delete them all with a simple



df.drop(rows)


I know I can count non-null values using the count function which I could them subtract from the total and get the NaN count that way (Is there a direct way to count NaN values in a row?). But even so, I am not sure how to write a loop that goes through a DataFrame row-by-row.



Here's some pseudo-code that I think is on the right track:



### LOOP FOR ADDRESSING EACH row:
m = total - row.count()
if (m > 7):
df.drop(row)


I am still new to Pandas so I'm very open to other ways of solving this problem; whether they're simpler or more complex.










share|improve this question















I have a DataFrame containing many NaN values. I want to delete rows that contain too many NaN values; specifically: 7 or more.



I tried using the dropna function several ways but it seems clear that it greedily deletes columns or rows that contain any NaN values.



This question (Slice Pandas DataFrame by Row), shows me that if I can just compile a list of the rows that have too many NaN values, I can delete them all with a simple



df.drop(rows)


I know I can count non-null values using the count function which I could them subtract from the total and get the NaN count that way (Is there a direct way to count NaN values in a row?). But even so, I am not sure how to write a loop that goes through a DataFrame row-by-row.



Here's some pseudo-code that I think is on the right track:



### LOOP FOR ADDRESSING EACH row:
m = total - row.count()
if (m > 7):
df.drop(row)


I am still new to Pandas so I'm very open to other ways of solving this problem; whether they're simpler or more complex.







python pandas rows dataframe






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited May 23 '17 at 12:31









Community

11




11










asked Aug 5 '14 at 18:56









SlavatronSlavatron

77141125




77141125








  • 1




    There is a thresh param to specify the minimum number of non-NA values: pandas.pydata.org/pandas-docs/stable/generated/… have you tried this?
    – EdChum
    Aug 5 '14 at 19:07










  • I had not noticed that, thank you. It suits my needs perfectly.
    – Slavatron
    Aug 5 '14 at 19:12






  • 1




    df.dropna(thresh=3) was all I needed (there are 9 columns in the dataframe)
    – Slavatron
    Aug 5 '14 at 19:25










  • I thought I'd put a dynamic method in my answer in the case where you don't the number of columns, glad I could help
    – EdChum
    Aug 5 '14 at 19:26














  • 1




    There is a thresh param to specify the minimum number of non-NA values: pandas.pydata.org/pandas-docs/stable/generated/… have you tried this?
    – EdChum
    Aug 5 '14 at 19:07










  • I had not noticed that, thank you. It suits my needs perfectly.
    – Slavatron
    Aug 5 '14 at 19:12






  • 1




    df.dropna(thresh=3) was all I needed (there are 9 columns in the dataframe)
    – Slavatron
    Aug 5 '14 at 19:25










  • I thought I'd put a dynamic method in my answer in the case where you don't the number of columns, glad I could help
    – EdChum
    Aug 5 '14 at 19:26








1




1




There is a thresh param to specify the minimum number of non-NA values: pandas.pydata.org/pandas-docs/stable/generated/… have you tried this?
– EdChum
Aug 5 '14 at 19:07




There is a thresh param to specify the minimum number of non-NA values: pandas.pydata.org/pandas-docs/stable/generated/… have you tried this?
– EdChum
Aug 5 '14 at 19:07












I had not noticed that, thank you. It suits my needs perfectly.
– Slavatron
Aug 5 '14 at 19:12




I had not noticed that, thank you. It suits my needs perfectly.
– Slavatron
Aug 5 '14 at 19:12




1




1




df.dropna(thresh=3) was all I needed (there are 9 columns in the dataframe)
– Slavatron
Aug 5 '14 at 19:25




df.dropna(thresh=3) was all I needed (there are 9 columns in the dataframe)
– Slavatron
Aug 5 '14 at 19:25












I thought I'd put a dynamic method in my answer in the case where you don't the number of columns, glad I could help
– EdChum
Aug 5 '14 at 19:26




I thought I'd put a dynamic method in my answer in the case where you don't the number of columns, glad I could help
– EdChum
Aug 5 '14 at 19:26












2 Answers
2






active

oldest

votes


















14














Basically the way to do this is determine the number of cols, set the minimum number of non-nan values and drop the rows that don't meet this criteria:



df.dropna(thresh=(len(df) - 7))


See the docs






share|improve this answer



















  • 3




    I had to use len(df.columns) instead of len(df). Worked like a charm.
    – thecircus
    Sep 1 '15 at 15:26






  • 2




    Doesn't axis=1 tells it to drop columns? At least in my case columns get deleted when I choose axis=1
    – xkcd
    Feb 22 '16 at 17:35










  • @xkcd it depends on the function, in this case it's the opposite
    – EdChum
    Feb 22 '16 at 17:48










  • axis=1 will drop the columns, not the rows. "{0 or ‘index’, 1 or ‘columns’}" straight from the docs.
    – Paul English
    Jul 14 '16 at 19:07










  • @PaulEnglish You're correct, I'm not sure if this was due to an error in the docs historically or if I was confusing this with drop which does flip the expected meaning of axis, will update and thanks for pointing this out
    – EdChum
    Jul 15 '16 at 8:46



















2














The optional thresh argument of df.dropna lets you give it the minimum number of non-NA values in order to keep the row.



df.dropna(thresh=df.shape[1]-7)





share|improve this answer





















    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f25146277%2fpandas-delete-rows-with-only-nan-values%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    14














    Basically the way to do this is determine the number of cols, set the minimum number of non-nan values and drop the rows that don't meet this criteria:



    df.dropna(thresh=(len(df) - 7))


    See the docs






    share|improve this answer



















    • 3




      I had to use len(df.columns) instead of len(df). Worked like a charm.
      – thecircus
      Sep 1 '15 at 15:26






    • 2




      Doesn't axis=1 tells it to drop columns? At least in my case columns get deleted when I choose axis=1
      – xkcd
      Feb 22 '16 at 17:35










    • @xkcd it depends on the function, in this case it's the opposite
      – EdChum
      Feb 22 '16 at 17:48










    • axis=1 will drop the columns, not the rows. "{0 or ‘index’, 1 or ‘columns’}" straight from the docs.
      – Paul English
      Jul 14 '16 at 19:07










    • @PaulEnglish You're correct, I'm not sure if this was due to an error in the docs historically or if I was confusing this with drop which does flip the expected meaning of axis, will update and thanks for pointing this out
      – EdChum
      Jul 15 '16 at 8:46
















    14














    Basically the way to do this is determine the number of cols, set the minimum number of non-nan values and drop the rows that don't meet this criteria:



    df.dropna(thresh=(len(df) - 7))


    See the docs






    share|improve this answer



















    • 3




      I had to use len(df.columns) instead of len(df). Worked like a charm.
      – thecircus
      Sep 1 '15 at 15:26






    • 2




      Doesn't axis=1 tells it to drop columns? At least in my case columns get deleted when I choose axis=1
      – xkcd
      Feb 22 '16 at 17:35










    • @xkcd it depends on the function, in this case it's the opposite
      – EdChum
      Feb 22 '16 at 17:48










    • axis=1 will drop the columns, not the rows. "{0 or ‘index’, 1 or ‘columns’}" straight from the docs.
      – Paul English
      Jul 14 '16 at 19:07










    • @PaulEnglish You're correct, I'm not sure if this was due to an error in the docs historically or if I was confusing this with drop which does flip the expected meaning of axis, will update and thanks for pointing this out
      – EdChum
      Jul 15 '16 at 8:46














    14












    14








    14






    Basically the way to do this is determine the number of cols, set the minimum number of non-nan values and drop the rows that don't meet this criteria:



    df.dropna(thresh=(len(df) - 7))


    See the docs






    share|improve this answer














    Basically the way to do this is determine the number of cols, set the minimum number of non-nan values and drop the rows that don't meet this criteria:



    df.dropna(thresh=(len(df) - 7))


    See the docs







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Nov 19 '18 at 21:12

























    answered Aug 5 '14 at 19:15









    EdChumEdChum

    172k32364314




    172k32364314








    • 3




      I had to use len(df.columns) instead of len(df). Worked like a charm.
      – thecircus
      Sep 1 '15 at 15:26






    • 2




      Doesn't axis=1 tells it to drop columns? At least in my case columns get deleted when I choose axis=1
      – xkcd
      Feb 22 '16 at 17:35










    • @xkcd it depends on the function, in this case it's the opposite
      – EdChum
      Feb 22 '16 at 17:48










    • axis=1 will drop the columns, not the rows. "{0 or ‘index’, 1 or ‘columns’}" straight from the docs.
      – Paul English
      Jul 14 '16 at 19:07










    • @PaulEnglish You're correct, I'm not sure if this was due to an error in the docs historically or if I was confusing this with drop which does flip the expected meaning of axis, will update and thanks for pointing this out
      – EdChum
      Jul 15 '16 at 8:46














    • 3




      I had to use len(df.columns) instead of len(df). Worked like a charm.
      – thecircus
      Sep 1 '15 at 15:26






    • 2




      Doesn't axis=1 tells it to drop columns? At least in my case columns get deleted when I choose axis=1
      – xkcd
      Feb 22 '16 at 17:35










    • @xkcd it depends on the function, in this case it's the opposite
      – EdChum
      Feb 22 '16 at 17:48










    • axis=1 will drop the columns, not the rows. "{0 or ‘index’, 1 or ‘columns’}" straight from the docs.
      – Paul English
      Jul 14 '16 at 19:07










    • @PaulEnglish You're correct, I'm not sure if this was due to an error in the docs historically or if I was confusing this with drop which does flip the expected meaning of axis, will update and thanks for pointing this out
      – EdChum
      Jul 15 '16 at 8:46








    3




    3




    I had to use len(df.columns) instead of len(df). Worked like a charm.
    – thecircus
    Sep 1 '15 at 15:26




    I had to use len(df.columns) instead of len(df). Worked like a charm.
    – thecircus
    Sep 1 '15 at 15:26




    2




    2




    Doesn't axis=1 tells it to drop columns? At least in my case columns get deleted when I choose axis=1
    – xkcd
    Feb 22 '16 at 17:35




    Doesn't axis=1 tells it to drop columns? At least in my case columns get deleted when I choose axis=1
    – xkcd
    Feb 22 '16 at 17:35












    @xkcd it depends on the function, in this case it's the opposite
    – EdChum
    Feb 22 '16 at 17:48




    @xkcd it depends on the function, in this case it's the opposite
    – EdChum
    Feb 22 '16 at 17:48












    axis=1 will drop the columns, not the rows. "{0 or ‘index’, 1 or ‘columns’}" straight from the docs.
    – Paul English
    Jul 14 '16 at 19:07




    axis=1 will drop the columns, not the rows. "{0 or ‘index’, 1 or ‘columns’}" straight from the docs.
    – Paul English
    Jul 14 '16 at 19:07












    @PaulEnglish You're correct, I'm not sure if this was due to an error in the docs historically or if I was confusing this with drop which does flip the expected meaning of axis, will update and thanks for pointing this out
    – EdChum
    Jul 15 '16 at 8:46




    @PaulEnglish You're correct, I'm not sure if this was due to an error in the docs historically or if I was confusing this with drop which does flip the expected meaning of axis, will update and thanks for pointing this out
    – EdChum
    Jul 15 '16 at 8:46













    2














    The optional thresh argument of df.dropna lets you give it the minimum number of non-NA values in order to keep the row.



    df.dropna(thresh=df.shape[1]-7)





    share|improve this answer


























      2














      The optional thresh argument of df.dropna lets you give it the minimum number of non-NA values in order to keep the row.



      df.dropna(thresh=df.shape[1]-7)





      share|improve this answer
























        2












        2








        2






        The optional thresh argument of df.dropna lets you give it the minimum number of non-NA values in order to keep the row.



        df.dropna(thresh=df.shape[1]-7)





        share|improve this answer












        The optional thresh argument of df.dropna lets you give it the minimum number of non-NA values in order to keep the row.



        df.dropna(thresh=df.shape[1]-7)






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Aug 5 '14 at 19:14









        Roger FanRoger Fan

        3,6421931




        3,6421931






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f25146277%2fpandas-delete-rows-with-only-nan-values%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            MongoDB - Not Authorized To Execute Command

            How to fix TextFormField cause rebuild widget in Flutter

            in spring boot 2.1 many test slices are not allowed anymore due to multiple @BootstrapWith