Pandas - Delete Rows with only NaN values
I have a DataFrame containing many NaN values. I want to delete rows that contain too many NaN values; specifically: 7 or more.
I tried using the dropna function several ways but it seems clear that it greedily deletes columns or rows that contain any NaN values.
This question (Slice Pandas DataFrame by Row), shows me that if I can just compile a list of the rows that have too many NaN values, I can delete them all with a simple
df.drop(rows)
I know I can count non-null values using the count function which I could them subtract from the total and get the NaN count that way (Is there a direct way to count NaN values in a row?). But even so, I am not sure how to write a loop that goes through a DataFrame row-by-row.
Here's some pseudo-code that I think is on the right track:
### LOOP FOR ADDRESSING EACH row:
m = total - row.count()
if (m > 7):
df.drop(row)
I am still new to Pandas so I'm very open to other ways of solving this problem; whether they're simpler or more complex.
python pandas rows dataframe
add a comment |
I have a DataFrame containing many NaN values. I want to delete rows that contain too many NaN values; specifically: 7 or more.
I tried using the dropna function several ways but it seems clear that it greedily deletes columns or rows that contain any NaN values.
This question (Slice Pandas DataFrame by Row), shows me that if I can just compile a list of the rows that have too many NaN values, I can delete them all with a simple
df.drop(rows)
I know I can count non-null values using the count function which I could them subtract from the total and get the NaN count that way (Is there a direct way to count NaN values in a row?). But even so, I am not sure how to write a loop that goes through a DataFrame row-by-row.
Here's some pseudo-code that I think is on the right track:
### LOOP FOR ADDRESSING EACH row:
m = total - row.count()
if (m > 7):
df.drop(row)
I am still new to Pandas so I'm very open to other ways of solving this problem; whether they're simpler or more complex.
python pandas rows dataframe
1
There is athresh
param to specify the minimum number of non-NA values: pandas.pydata.org/pandas-docs/stable/generated/… have you tried this?
– EdChum
Aug 5 '14 at 19:07
I had not noticed that, thank you. It suits my needs perfectly.
– Slavatron
Aug 5 '14 at 19:12
1
df.dropna(thresh=3) was all I needed (there are 9 columns in the dataframe)
– Slavatron
Aug 5 '14 at 19:25
I thought I'd put a dynamic method in my answer in the case where you don't the number of columns, glad I could help
– EdChum
Aug 5 '14 at 19:26
add a comment |
I have a DataFrame containing many NaN values. I want to delete rows that contain too many NaN values; specifically: 7 or more.
I tried using the dropna function several ways but it seems clear that it greedily deletes columns or rows that contain any NaN values.
This question (Slice Pandas DataFrame by Row), shows me that if I can just compile a list of the rows that have too many NaN values, I can delete them all with a simple
df.drop(rows)
I know I can count non-null values using the count function which I could them subtract from the total and get the NaN count that way (Is there a direct way to count NaN values in a row?). But even so, I am not sure how to write a loop that goes through a DataFrame row-by-row.
Here's some pseudo-code that I think is on the right track:
### LOOP FOR ADDRESSING EACH row:
m = total - row.count()
if (m > 7):
df.drop(row)
I am still new to Pandas so I'm very open to other ways of solving this problem; whether they're simpler or more complex.
python pandas rows dataframe
I have a DataFrame containing many NaN values. I want to delete rows that contain too many NaN values; specifically: 7 or more.
I tried using the dropna function several ways but it seems clear that it greedily deletes columns or rows that contain any NaN values.
This question (Slice Pandas DataFrame by Row), shows me that if I can just compile a list of the rows that have too many NaN values, I can delete them all with a simple
df.drop(rows)
I know I can count non-null values using the count function which I could them subtract from the total and get the NaN count that way (Is there a direct way to count NaN values in a row?). But even so, I am not sure how to write a loop that goes through a DataFrame row-by-row.
Here's some pseudo-code that I think is on the right track:
### LOOP FOR ADDRESSING EACH row:
m = total - row.count()
if (m > 7):
df.drop(row)
I am still new to Pandas so I'm very open to other ways of solving this problem; whether they're simpler or more complex.
python pandas rows dataframe
python pandas rows dataframe
edited May 23 '17 at 12:31
Community♦
11
11
asked Aug 5 '14 at 18:56
SlavatronSlavatron
77141125
77141125
1
There is athresh
param to specify the minimum number of non-NA values: pandas.pydata.org/pandas-docs/stable/generated/… have you tried this?
– EdChum
Aug 5 '14 at 19:07
I had not noticed that, thank you. It suits my needs perfectly.
– Slavatron
Aug 5 '14 at 19:12
1
df.dropna(thresh=3) was all I needed (there are 9 columns in the dataframe)
– Slavatron
Aug 5 '14 at 19:25
I thought I'd put a dynamic method in my answer in the case where you don't the number of columns, glad I could help
– EdChum
Aug 5 '14 at 19:26
add a comment |
1
There is athresh
param to specify the minimum number of non-NA values: pandas.pydata.org/pandas-docs/stable/generated/… have you tried this?
– EdChum
Aug 5 '14 at 19:07
I had not noticed that, thank you. It suits my needs perfectly.
– Slavatron
Aug 5 '14 at 19:12
1
df.dropna(thresh=3) was all I needed (there are 9 columns in the dataframe)
– Slavatron
Aug 5 '14 at 19:25
I thought I'd put a dynamic method in my answer in the case where you don't the number of columns, glad I could help
– EdChum
Aug 5 '14 at 19:26
1
1
There is a
thresh
param to specify the minimum number of non-NA values: pandas.pydata.org/pandas-docs/stable/generated/… have you tried this?– EdChum
Aug 5 '14 at 19:07
There is a
thresh
param to specify the minimum number of non-NA values: pandas.pydata.org/pandas-docs/stable/generated/… have you tried this?– EdChum
Aug 5 '14 at 19:07
I had not noticed that, thank you. It suits my needs perfectly.
– Slavatron
Aug 5 '14 at 19:12
I had not noticed that, thank you. It suits my needs perfectly.
– Slavatron
Aug 5 '14 at 19:12
1
1
df.dropna(thresh=3) was all I needed (there are 9 columns in the dataframe)
– Slavatron
Aug 5 '14 at 19:25
df.dropna(thresh=3) was all I needed (there are 9 columns in the dataframe)
– Slavatron
Aug 5 '14 at 19:25
I thought I'd put a dynamic method in my answer in the case where you don't the number of columns, glad I could help
– EdChum
Aug 5 '14 at 19:26
I thought I'd put a dynamic method in my answer in the case where you don't the number of columns, glad I could help
– EdChum
Aug 5 '14 at 19:26
add a comment |
2 Answers
2
active
oldest
votes
Basically the way to do this is determine the number of cols, set the minimum number of non-nan values and drop the rows that don't meet this criteria:
df.dropna(thresh=(len(df) - 7))
See the docs
3
I had to use len(df.columns) instead of len(df). Worked like a charm.
– thecircus
Sep 1 '15 at 15:26
2
Doesn't axis=1 tells it to drop columns? At least in my case columns get deleted when I choose axis=1
– xkcd
Feb 22 '16 at 17:35
@xkcd it depends on the function, in this case it's the opposite
– EdChum
Feb 22 '16 at 17:48
axis=1
will drop the columns, not the rows. "{0 or ‘index’, 1 or ‘columns’}" straight from the docs.
– Paul English
Jul 14 '16 at 19:07
@PaulEnglish You're correct, I'm not sure if this was due to an error in the docs historically or if I was confusing this withdrop
which does flip the expected meaning ofaxis
, will update and thanks for pointing this out
– EdChum
Jul 15 '16 at 8:46
add a comment |
The optional thresh argument of df.dropna lets you give it the minimum number of non-NA values in order to keep the row.
df.dropna(thresh=df.shape[1]-7)
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f25146277%2fpandas-delete-rows-with-only-nan-values%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
Basically the way to do this is determine the number of cols, set the minimum number of non-nan values and drop the rows that don't meet this criteria:
df.dropna(thresh=(len(df) - 7))
See the docs
3
I had to use len(df.columns) instead of len(df). Worked like a charm.
– thecircus
Sep 1 '15 at 15:26
2
Doesn't axis=1 tells it to drop columns? At least in my case columns get deleted when I choose axis=1
– xkcd
Feb 22 '16 at 17:35
@xkcd it depends on the function, in this case it's the opposite
– EdChum
Feb 22 '16 at 17:48
axis=1
will drop the columns, not the rows. "{0 or ‘index’, 1 or ‘columns’}" straight from the docs.
– Paul English
Jul 14 '16 at 19:07
@PaulEnglish You're correct, I'm not sure if this was due to an error in the docs historically or if I was confusing this withdrop
which does flip the expected meaning ofaxis
, will update and thanks for pointing this out
– EdChum
Jul 15 '16 at 8:46
add a comment |
Basically the way to do this is determine the number of cols, set the minimum number of non-nan values and drop the rows that don't meet this criteria:
df.dropna(thresh=(len(df) - 7))
See the docs
3
I had to use len(df.columns) instead of len(df). Worked like a charm.
– thecircus
Sep 1 '15 at 15:26
2
Doesn't axis=1 tells it to drop columns? At least in my case columns get deleted when I choose axis=1
– xkcd
Feb 22 '16 at 17:35
@xkcd it depends on the function, in this case it's the opposite
– EdChum
Feb 22 '16 at 17:48
axis=1
will drop the columns, not the rows. "{0 or ‘index’, 1 or ‘columns’}" straight from the docs.
– Paul English
Jul 14 '16 at 19:07
@PaulEnglish You're correct, I'm not sure if this was due to an error in the docs historically or if I was confusing this withdrop
which does flip the expected meaning ofaxis
, will update and thanks for pointing this out
– EdChum
Jul 15 '16 at 8:46
add a comment |
Basically the way to do this is determine the number of cols, set the minimum number of non-nan values and drop the rows that don't meet this criteria:
df.dropna(thresh=(len(df) - 7))
See the docs
Basically the way to do this is determine the number of cols, set the minimum number of non-nan values and drop the rows that don't meet this criteria:
df.dropna(thresh=(len(df) - 7))
See the docs
edited Nov 19 '18 at 21:12
answered Aug 5 '14 at 19:15
EdChumEdChum
172k32364314
172k32364314
3
I had to use len(df.columns) instead of len(df). Worked like a charm.
– thecircus
Sep 1 '15 at 15:26
2
Doesn't axis=1 tells it to drop columns? At least in my case columns get deleted when I choose axis=1
– xkcd
Feb 22 '16 at 17:35
@xkcd it depends on the function, in this case it's the opposite
– EdChum
Feb 22 '16 at 17:48
axis=1
will drop the columns, not the rows. "{0 or ‘index’, 1 or ‘columns’}" straight from the docs.
– Paul English
Jul 14 '16 at 19:07
@PaulEnglish You're correct, I'm not sure if this was due to an error in the docs historically or if I was confusing this withdrop
which does flip the expected meaning ofaxis
, will update and thanks for pointing this out
– EdChum
Jul 15 '16 at 8:46
add a comment |
3
I had to use len(df.columns) instead of len(df). Worked like a charm.
– thecircus
Sep 1 '15 at 15:26
2
Doesn't axis=1 tells it to drop columns? At least in my case columns get deleted when I choose axis=1
– xkcd
Feb 22 '16 at 17:35
@xkcd it depends on the function, in this case it's the opposite
– EdChum
Feb 22 '16 at 17:48
axis=1
will drop the columns, not the rows. "{0 or ‘index’, 1 or ‘columns’}" straight from the docs.
– Paul English
Jul 14 '16 at 19:07
@PaulEnglish You're correct, I'm not sure if this was due to an error in the docs historically or if I was confusing this withdrop
which does flip the expected meaning ofaxis
, will update and thanks for pointing this out
– EdChum
Jul 15 '16 at 8:46
3
3
I had to use len(df.columns) instead of len(df). Worked like a charm.
– thecircus
Sep 1 '15 at 15:26
I had to use len(df.columns) instead of len(df). Worked like a charm.
– thecircus
Sep 1 '15 at 15:26
2
2
Doesn't axis=1 tells it to drop columns? At least in my case columns get deleted when I choose axis=1
– xkcd
Feb 22 '16 at 17:35
Doesn't axis=1 tells it to drop columns? At least in my case columns get deleted when I choose axis=1
– xkcd
Feb 22 '16 at 17:35
@xkcd it depends on the function, in this case it's the opposite
– EdChum
Feb 22 '16 at 17:48
@xkcd it depends on the function, in this case it's the opposite
– EdChum
Feb 22 '16 at 17:48
axis=1
will drop the columns, not the rows. "{0 or ‘index’, 1 or ‘columns’}" straight from the docs.– Paul English
Jul 14 '16 at 19:07
axis=1
will drop the columns, not the rows. "{0 or ‘index’, 1 or ‘columns’}" straight from the docs.– Paul English
Jul 14 '16 at 19:07
@PaulEnglish You're correct, I'm not sure if this was due to an error in the docs historically or if I was confusing this with
drop
which does flip the expected meaning of axis
, will update and thanks for pointing this out– EdChum
Jul 15 '16 at 8:46
@PaulEnglish You're correct, I'm not sure if this was due to an error in the docs historically or if I was confusing this with
drop
which does flip the expected meaning of axis
, will update and thanks for pointing this out– EdChum
Jul 15 '16 at 8:46
add a comment |
The optional thresh argument of df.dropna lets you give it the minimum number of non-NA values in order to keep the row.
df.dropna(thresh=df.shape[1]-7)
add a comment |
The optional thresh argument of df.dropna lets you give it the minimum number of non-NA values in order to keep the row.
df.dropna(thresh=df.shape[1]-7)
add a comment |
The optional thresh argument of df.dropna lets you give it the minimum number of non-NA values in order to keep the row.
df.dropna(thresh=df.shape[1]-7)
The optional thresh argument of df.dropna lets you give it the minimum number of non-NA values in order to keep the row.
df.dropna(thresh=df.shape[1]-7)
answered Aug 5 '14 at 19:14
Roger FanRoger Fan
3,6421931
3,6421931
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f25146277%2fpandas-delete-rows-with-only-nan-values%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
There is a
thresh
param to specify the minimum number of non-NA values: pandas.pydata.org/pandas-docs/stable/generated/… have you tried this?– EdChum
Aug 5 '14 at 19:07
I had not noticed that, thank you. It suits my needs perfectly.
– Slavatron
Aug 5 '14 at 19:12
1
df.dropna(thresh=3) was all I needed (there are 9 columns in the dataframe)
– Slavatron
Aug 5 '14 at 19:25
I thought I'd put a dynamic method in my answer in the case where you don't the number of columns, glad I could help
– EdChum
Aug 5 '14 at 19:26