How to implement 'in' and 'not in' for Pandas dataframe
How can I achieve the equivalents of SQL's IN
and NOT IN
?
I have a list with the required values.
Here's the scenario:
df = pd.DataFrame({'countries':['US','UK','Germany','China']})
countries = ['UK','China']
# pseudo-code:
df[df['countries'] not in countries]
My current way of doing this is as follows:
df = pd.DataFrame({'countries':['US','UK','Germany','China']})
countries = pd.DataFrame({'countries':['UK','China'], 'matched':True})
# IN
df.merge(countries,how='inner',on='countries')
# NOT IN
not_in = df.merge(countries,how='left',on='countries')
not_in = not_in[pd.isnull(not_in['matched'])]
But this seems like a horrible kludge. Can anyone improve on it?
python pandas dataframe sql-function
add a comment |
How can I achieve the equivalents of SQL's IN
and NOT IN
?
I have a list with the required values.
Here's the scenario:
df = pd.DataFrame({'countries':['US','UK','Germany','China']})
countries = ['UK','China']
# pseudo-code:
df[df['countries'] not in countries]
My current way of doing this is as follows:
df = pd.DataFrame({'countries':['US','UK','Germany','China']})
countries = pd.DataFrame({'countries':['UK','China'], 'matched':True})
# IN
df.merge(countries,how='inner',on='countries')
# NOT IN
not_in = df.merge(countries,how='left',on='countries')
not_in = not_in[pd.isnull(not_in['matched'])]
But this seems like a horrible kludge. Can anyone improve on it?
python pandas dataframe sql-function
1
I think your solution is the best solution. Yours can cover IN, NOT_IN of multiple columns.
– Bruce Jung
Mar 17 '15 at 1:55
Do you want to test on single column or multiple columns?
– smci
Jul 17 '15 at 20:26
Related (performance / pandas internals): Pandas pd.Series.isin performance with set versus array
– jpp
Jun 28 '18 at 0:06
add a comment |
How can I achieve the equivalents of SQL's IN
and NOT IN
?
I have a list with the required values.
Here's the scenario:
df = pd.DataFrame({'countries':['US','UK','Germany','China']})
countries = ['UK','China']
# pseudo-code:
df[df['countries'] not in countries]
My current way of doing this is as follows:
df = pd.DataFrame({'countries':['US','UK','Germany','China']})
countries = pd.DataFrame({'countries':['UK','China'], 'matched':True})
# IN
df.merge(countries,how='inner',on='countries')
# NOT IN
not_in = df.merge(countries,how='left',on='countries')
not_in = not_in[pd.isnull(not_in['matched'])]
But this seems like a horrible kludge. Can anyone improve on it?
python pandas dataframe sql-function
How can I achieve the equivalents of SQL's IN
and NOT IN
?
I have a list with the required values.
Here's the scenario:
df = pd.DataFrame({'countries':['US','UK','Germany','China']})
countries = ['UK','China']
# pseudo-code:
df[df['countries'] not in countries]
My current way of doing this is as follows:
df = pd.DataFrame({'countries':['US','UK','Germany','China']})
countries = pd.DataFrame({'countries':['UK','China'], 'matched':True})
# IN
df.merge(countries,how='inner',on='countries')
# NOT IN
not_in = df.merge(countries,how='left',on='countries')
not_in = not_in[pd.isnull(not_in['matched'])]
But this seems like a horrible kludge. Can anyone improve on it?
python pandas dataframe sql-function
python pandas dataframe sql-function
edited Jul 17 '15 at 20:25
smci
14.8k672104
14.8k672104
asked Nov 13 '13 at 17:11


LondonRobLondonRob
26.4k1471112
26.4k1471112
1
I think your solution is the best solution. Yours can cover IN, NOT_IN of multiple columns.
– Bruce Jung
Mar 17 '15 at 1:55
Do you want to test on single column or multiple columns?
– smci
Jul 17 '15 at 20:26
Related (performance / pandas internals): Pandas pd.Series.isin performance with set versus array
– jpp
Jun 28 '18 at 0:06
add a comment |
1
I think your solution is the best solution. Yours can cover IN, NOT_IN of multiple columns.
– Bruce Jung
Mar 17 '15 at 1:55
Do you want to test on single column or multiple columns?
– smci
Jul 17 '15 at 20:26
Related (performance / pandas internals): Pandas pd.Series.isin performance with set versus array
– jpp
Jun 28 '18 at 0:06
1
1
I think your solution is the best solution. Yours can cover IN, NOT_IN of multiple columns.
– Bruce Jung
Mar 17 '15 at 1:55
I think your solution is the best solution. Yours can cover IN, NOT_IN of multiple columns.
– Bruce Jung
Mar 17 '15 at 1:55
Do you want to test on single column or multiple columns?
– smci
Jul 17 '15 at 20:26
Do you want to test on single column or multiple columns?
– smci
Jul 17 '15 at 20:26
Related (performance / pandas internals): Pandas pd.Series.isin performance with set versus array
– jpp
Jun 28 '18 at 0:06
Related (performance / pandas internals): Pandas pd.Series.isin performance with set versus array
– jpp
Jun 28 '18 at 0:06
add a comment |
5 Answers
5
active
oldest
votes
You can use pd.Series.isin
.
For "IN" use: something.isin(somewhere)
Or for "NOT IN": ~something.isin(somewhere)
As a worked example:
>>> df
countries
0 US
1 UK
2 Germany
3 China
>>> countries
['UK', 'China']
>>> df.countries.isin(countries)
0 False
1 True
2 False
3 True
Name: countries, dtype: bool
>>> df[df.countries.isin(countries)]
countries
1 UK
3 China
>>> df[~df.countries.isin(countries)]
countries
0 US
2 Germany
29
isin
is not inversesin()
? :D
– Kos
Nov 13 '13 at 17:15
1
Just an FYI, the @LondonRob had his as a DataFrame and yours is a Series. DataFrame'sisin
was added in .13.
– TomAugspurger
Nov 13 '13 at 18:07
Any suggestions for how to do this with pandas 0.12.0? It's the current released version. (Maybe I should just wait for 0.13?!)
– LondonRob
Nov 13 '13 at 18:41
2
@TomAugspurger: like usual, I'm probably missing something.df
, both mine and his, is aDataFrame
.countries
is a list.df[~df.countries.isin(countries)]
produces aDataFrame
, not aSeries
, and seems to work even back in 0.11.0.dev-14a04dd.
– DSM
Nov 14 '13 at 16:10
2
This answer is confusing because you keep reusing thecountries
variable. Well, the OP does it, and that's inherited, but that something is done badly before does not justify doing it badly now.
– ifly6
May 18 '18 at 22:20
|
show 4 more comments
Alternative solution that uses .query() method:
In [5]: df.query("countries in @countries")
Out[5]:
countries
1 UK
3 China
In [6]: df.query("countries not in @countries")
Out[6]:
countries
0 US
2 Germany
3
Note that this is currently marked as "experimental" in the docs...
– LondonRob
Jul 19 '17 at 14:49
add a comment |
I've been usually doing generic filtering over rows like this:
criterion = lambda row: row['countries'] not in countries
not_in = df[df.apply(criterion, axis=1)]
6
FYI, this is much slower than @DSM soln which is vectorized
– Jeff
Nov 13 '13 at 17:47
@Jeff I'd expect that, but that's what I fall back to when I need to filter over something unavailable in pandas directly. (I was about to say "like .startwith or regex matching, but just found out about Series.str that has all of that!)
– Kos
Nov 14 '13 at 7:42
add a comment |
I wanted to filter out dfbc rows that had a BUSINESS_ID that was also in the BUSINESS_ID of dfProfilesBusIds
Finally got it working:
dfbc = dfbc[(dfbc['BUSINESS_ID'].isin(dfProfilesBusIds['BUSINESS_ID']) == False)]
3
You can negate the isin (as done in the accepted answer) rather than comparing to False
– cricket_007
Jul 19 '17 at 12:17
add a comment |
df = pd.DataFrame({'countries':['US','UK','Germany','China']})
countries = ['UK','China']
implement in:
df[df.countries.isin(countries)]
implement not in as in of rest countries:
df[df.countries.isin([x for x in np.unique(df.countries) if x not in countries])]
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f19960077%2fhow-to-implement-in-and-not-in-for-pandas-dataframe%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
5 Answers
5
active
oldest
votes
5 Answers
5
active
oldest
votes
active
oldest
votes
active
oldest
votes
You can use pd.Series.isin
.
For "IN" use: something.isin(somewhere)
Or for "NOT IN": ~something.isin(somewhere)
As a worked example:
>>> df
countries
0 US
1 UK
2 Germany
3 China
>>> countries
['UK', 'China']
>>> df.countries.isin(countries)
0 False
1 True
2 False
3 True
Name: countries, dtype: bool
>>> df[df.countries.isin(countries)]
countries
1 UK
3 China
>>> df[~df.countries.isin(countries)]
countries
0 US
2 Germany
29
isin
is not inversesin()
? :D
– Kos
Nov 13 '13 at 17:15
1
Just an FYI, the @LondonRob had his as a DataFrame and yours is a Series. DataFrame'sisin
was added in .13.
– TomAugspurger
Nov 13 '13 at 18:07
Any suggestions for how to do this with pandas 0.12.0? It's the current released version. (Maybe I should just wait for 0.13?!)
– LondonRob
Nov 13 '13 at 18:41
2
@TomAugspurger: like usual, I'm probably missing something.df
, both mine and his, is aDataFrame
.countries
is a list.df[~df.countries.isin(countries)]
produces aDataFrame
, not aSeries
, and seems to work even back in 0.11.0.dev-14a04dd.
– DSM
Nov 14 '13 at 16:10
2
This answer is confusing because you keep reusing thecountries
variable. Well, the OP does it, and that's inherited, but that something is done badly before does not justify doing it badly now.
– ifly6
May 18 '18 at 22:20
|
show 4 more comments
You can use pd.Series.isin
.
For "IN" use: something.isin(somewhere)
Or for "NOT IN": ~something.isin(somewhere)
As a worked example:
>>> df
countries
0 US
1 UK
2 Germany
3 China
>>> countries
['UK', 'China']
>>> df.countries.isin(countries)
0 False
1 True
2 False
3 True
Name: countries, dtype: bool
>>> df[df.countries.isin(countries)]
countries
1 UK
3 China
>>> df[~df.countries.isin(countries)]
countries
0 US
2 Germany
29
isin
is not inversesin()
? :D
– Kos
Nov 13 '13 at 17:15
1
Just an FYI, the @LondonRob had his as a DataFrame and yours is a Series. DataFrame'sisin
was added in .13.
– TomAugspurger
Nov 13 '13 at 18:07
Any suggestions for how to do this with pandas 0.12.0? It's the current released version. (Maybe I should just wait for 0.13?!)
– LondonRob
Nov 13 '13 at 18:41
2
@TomAugspurger: like usual, I'm probably missing something.df
, both mine and his, is aDataFrame
.countries
is a list.df[~df.countries.isin(countries)]
produces aDataFrame
, not aSeries
, and seems to work even back in 0.11.0.dev-14a04dd.
– DSM
Nov 14 '13 at 16:10
2
This answer is confusing because you keep reusing thecountries
variable. Well, the OP does it, and that's inherited, but that something is done badly before does not justify doing it badly now.
– ifly6
May 18 '18 at 22:20
|
show 4 more comments
You can use pd.Series.isin
.
For "IN" use: something.isin(somewhere)
Or for "NOT IN": ~something.isin(somewhere)
As a worked example:
>>> df
countries
0 US
1 UK
2 Germany
3 China
>>> countries
['UK', 'China']
>>> df.countries.isin(countries)
0 False
1 True
2 False
3 True
Name: countries, dtype: bool
>>> df[df.countries.isin(countries)]
countries
1 UK
3 China
>>> df[~df.countries.isin(countries)]
countries
0 US
2 Germany
You can use pd.Series.isin
.
For "IN" use: something.isin(somewhere)
Or for "NOT IN": ~something.isin(somewhere)
As a worked example:
>>> df
countries
0 US
1 UK
2 Germany
3 China
>>> countries
['UK', 'China']
>>> df.countries.isin(countries)
0 False
1 True
2 False
3 True
Name: countries, dtype: bool
>>> df[df.countries.isin(countries)]
countries
1 UK
3 China
>>> df[~df.countries.isin(countries)]
countries
0 US
2 Germany
edited Apr 15 '18 at 17:52


jpp
95.7k2157109
95.7k2157109
answered Nov 13 '13 at 17:13


DSMDSM
206k35397372
206k35397372
29
isin
is not inversesin()
? :D
– Kos
Nov 13 '13 at 17:15
1
Just an FYI, the @LondonRob had his as a DataFrame and yours is a Series. DataFrame'sisin
was added in .13.
– TomAugspurger
Nov 13 '13 at 18:07
Any suggestions for how to do this with pandas 0.12.0? It's the current released version. (Maybe I should just wait for 0.13?!)
– LondonRob
Nov 13 '13 at 18:41
2
@TomAugspurger: like usual, I'm probably missing something.df
, both mine and his, is aDataFrame
.countries
is a list.df[~df.countries.isin(countries)]
produces aDataFrame
, not aSeries
, and seems to work even back in 0.11.0.dev-14a04dd.
– DSM
Nov 14 '13 at 16:10
2
This answer is confusing because you keep reusing thecountries
variable. Well, the OP does it, and that's inherited, but that something is done badly before does not justify doing it badly now.
– ifly6
May 18 '18 at 22:20
|
show 4 more comments
29
isin
is not inversesin()
? :D
– Kos
Nov 13 '13 at 17:15
1
Just an FYI, the @LondonRob had his as a DataFrame and yours is a Series. DataFrame'sisin
was added in .13.
– TomAugspurger
Nov 13 '13 at 18:07
Any suggestions for how to do this with pandas 0.12.0? It's the current released version. (Maybe I should just wait for 0.13?!)
– LondonRob
Nov 13 '13 at 18:41
2
@TomAugspurger: like usual, I'm probably missing something.df
, both mine and his, is aDataFrame
.countries
is a list.df[~df.countries.isin(countries)]
produces aDataFrame
, not aSeries
, and seems to work even back in 0.11.0.dev-14a04dd.
– DSM
Nov 14 '13 at 16:10
2
This answer is confusing because you keep reusing thecountries
variable. Well, the OP does it, and that's inherited, but that something is done badly before does not justify doing it badly now.
– ifly6
May 18 '18 at 22:20
29
29
isin
is not inverse sin()
? :D– Kos
Nov 13 '13 at 17:15
isin
is not inverse sin()
? :D– Kos
Nov 13 '13 at 17:15
1
1
Just an FYI, the @LondonRob had his as a DataFrame and yours is a Series. DataFrame's
isin
was added in .13.– TomAugspurger
Nov 13 '13 at 18:07
Just an FYI, the @LondonRob had his as a DataFrame and yours is a Series. DataFrame's
isin
was added in .13.– TomAugspurger
Nov 13 '13 at 18:07
Any suggestions for how to do this with pandas 0.12.0? It's the current released version. (Maybe I should just wait for 0.13?!)
– LondonRob
Nov 13 '13 at 18:41
Any suggestions for how to do this with pandas 0.12.0? It's the current released version. (Maybe I should just wait for 0.13?!)
– LondonRob
Nov 13 '13 at 18:41
2
2
@TomAugspurger: like usual, I'm probably missing something.
df
, both mine and his, is a DataFrame
. countries
is a list. df[~df.countries.isin(countries)]
produces a DataFrame
, not a Series
, and seems to work even back in 0.11.0.dev-14a04dd.– DSM
Nov 14 '13 at 16:10
@TomAugspurger: like usual, I'm probably missing something.
df
, both mine and his, is a DataFrame
. countries
is a list. df[~df.countries.isin(countries)]
produces a DataFrame
, not a Series
, and seems to work even back in 0.11.0.dev-14a04dd.– DSM
Nov 14 '13 at 16:10
2
2
This answer is confusing because you keep reusing the
countries
variable. Well, the OP does it, and that's inherited, but that something is done badly before does not justify doing it badly now.– ifly6
May 18 '18 at 22:20
This answer is confusing because you keep reusing the
countries
variable. Well, the OP does it, and that's inherited, but that something is done badly before does not justify doing it badly now.– ifly6
May 18 '18 at 22:20
|
show 4 more comments
Alternative solution that uses .query() method:
In [5]: df.query("countries in @countries")
Out[5]:
countries
1 UK
3 China
In [6]: df.query("countries not in @countries")
Out[6]:
countries
0 US
2 Germany
3
Note that this is currently marked as "experimental" in the docs...
– LondonRob
Jul 19 '17 at 14:49
add a comment |
Alternative solution that uses .query() method:
In [5]: df.query("countries in @countries")
Out[5]:
countries
1 UK
3 China
In [6]: df.query("countries not in @countries")
Out[6]:
countries
0 US
2 Germany
3
Note that this is currently marked as "experimental" in the docs...
– LondonRob
Jul 19 '17 at 14:49
add a comment |
Alternative solution that uses .query() method:
In [5]: df.query("countries in @countries")
Out[5]:
countries
1 UK
3 China
In [6]: df.query("countries not in @countries")
Out[6]:
countries
0 US
2 Germany
Alternative solution that uses .query() method:
In [5]: df.query("countries in @countries")
Out[5]:
countries
1 UK
3 China
In [6]: df.query("countries not in @countries")
Out[6]:
countries
0 US
2 Germany
answered Jul 19 '17 at 12:19


MaxUMaxU
120k12112167
120k12112167
3
Note that this is currently marked as "experimental" in the docs...
– LondonRob
Jul 19 '17 at 14:49
add a comment |
3
Note that this is currently marked as "experimental" in the docs...
– LondonRob
Jul 19 '17 at 14:49
3
3
Note that this is currently marked as "experimental" in the docs...
– LondonRob
Jul 19 '17 at 14:49
Note that this is currently marked as "experimental" in the docs...
– LondonRob
Jul 19 '17 at 14:49
add a comment |
I've been usually doing generic filtering over rows like this:
criterion = lambda row: row['countries'] not in countries
not_in = df[df.apply(criterion, axis=1)]
6
FYI, this is much slower than @DSM soln which is vectorized
– Jeff
Nov 13 '13 at 17:47
@Jeff I'd expect that, but that's what I fall back to when I need to filter over something unavailable in pandas directly. (I was about to say "like .startwith or regex matching, but just found out about Series.str that has all of that!)
– Kos
Nov 14 '13 at 7:42
add a comment |
I've been usually doing generic filtering over rows like this:
criterion = lambda row: row['countries'] not in countries
not_in = df[df.apply(criterion, axis=1)]
6
FYI, this is much slower than @DSM soln which is vectorized
– Jeff
Nov 13 '13 at 17:47
@Jeff I'd expect that, but that's what I fall back to when I need to filter over something unavailable in pandas directly. (I was about to say "like .startwith or regex matching, but just found out about Series.str that has all of that!)
– Kos
Nov 14 '13 at 7:42
add a comment |
I've been usually doing generic filtering over rows like this:
criterion = lambda row: row['countries'] not in countries
not_in = df[df.apply(criterion, axis=1)]
I've been usually doing generic filtering over rows like this:
criterion = lambda row: row['countries'] not in countries
not_in = df[df.apply(criterion, axis=1)]
answered Nov 13 '13 at 17:14
KosKos
49.8k19120196
49.8k19120196
6
FYI, this is much slower than @DSM soln which is vectorized
– Jeff
Nov 13 '13 at 17:47
@Jeff I'd expect that, but that's what I fall back to when I need to filter over something unavailable in pandas directly. (I was about to say "like .startwith or regex matching, but just found out about Series.str that has all of that!)
– Kos
Nov 14 '13 at 7:42
add a comment |
6
FYI, this is much slower than @DSM soln which is vectorized
– Jeff
Nov 13 '13 at 17:47
@Jeff I'd expect that, but that's what I fall back to when I need to filter over something unavailable in pandas directly. (I was about to say "like .startwith or regex matching, but just found out about Series.str that has all of that!)
– Kos
Nov 14 '13 at 7:42
6
6
FYI, this is much slower than @DSM soln which is vectorized
– Jeff
Nov 13 '13 at 17:47
FYI, this is much slower than @DSM soln which is vectorized
– Jeff
Nov 13 '13 at 17:47
@Jeff I'd expect that, but that's what I fall back to when I need to filter over something unavailable in pandas directly. (I was about to say "like .startwith or regex matching, but just found out about Series.str that has all of that!)
– Kos
Nov 14 '13 at 7:42
@Jeff I'd expect that, but that's what I fall back to when I need to filter over something unavailable in pandas directly. (I was about to say "like .startwith or regex matching, but just found out about Series.str that has all of that!)
– Kos
Nov 14 '13 at 7:42
add a comment |
I wanted to filter out dfbc rows that had a BUSINESS_ID that was also in the BUSINESS_ID of dfProfilesBusIds
Finally got it working:
dfbc = dfbc[(dfbc['BUSINESS_ID'].isin(dfProfilesBusIds['BUSINESS_ID']) == False)]
3
You can negate the isin (as done in the accepted answer) rather than comparing to False
– cricket_007
Jul 19 '17 at 12:17
add a comment |
I wanted to filter out dfbc rows that had a BUSINESS_ID that was also in the BUSINESS_ID of dfProfilesBusIds
Finally got it working:
dfbc = dfbc[(dfbc['BUSINESS_ID'].isin(dfProfilesBusIds['BUSINESS_ID']) == False)]
3
You can negate the isin (as done in the accepted answer) rather than comparing to False
– cricket_007
Jul 19 '17 at 12:17
add a comment |
I wanted to filter out dfbc rows that had a BUSINESS_ID that was also in the BUSINESS_ID of dfProfilesBusIds
Finally got it working:
dfbc = dfbc[(dfbc['BUSINESS_ID'].isin(dfProfilesBusIds['BUSINESS_ID']) == False)]
I wanted to filter out dfbc rows that had a BUSINESS_ID that was also in the BUSINESS_ID of dfProfilesBusIds
Finally got it working:
dfbc = dfbc[(dfbc['BUSINESS_ID'].isin(dfProfilesBusIds['BUSINESS_ID']) == False)]
answered Jul 13 '17 at 3:12


Sam HendersonSam Henderson
17115
17115
3
You can negate the isin (as done in the accepted answer) rather than comparing to False
– cricket_007
Jul 19 '17 at 12:17
add a comment |
3
You can negate the isin (as done in the accepted answer) rather than comparing to False
– cricket_007
Jul 19 '17 at 12:17
3
3
You can negate the isin (as done in the accepted answer) rather than comparing to False
– cricket_007
Jul 19 '17 at 12:17
You can negate the isin (as done in the accepted answer) rather than comparing to False
– cricket_007
Jul 19 '17 at 12:17
add a comment |
df = pd.DataFrame({'countries':['US','UK','Germany','China']})
countries = ['UK','China']
implement in:
df[df.countries.isin(countries)]
implement not in as in of rest countries:
df[df.countries.isin([x for x in np.unique(df.countries) if x not in countries])]
add a comment |
df = pd.DataFrame({'countries':['US','UK','Germany','China']})
countries = ['UK','China']
implement in:
df[df.countries.isin(countries)]
implement not in as in of rest countries:
df[df.countries.isin([x for x in np.unique(df.countries) if x not in countries])]
add a comment |
df = pd.DataFrame({'countries':['US','UK','Germany','China']})
countries = ['UK','China']
implement in:
df[df.countries.isin(countries)]
implement not in as in of rest countries:
df[df.countries.isin([x for x in np.unique(df.countries) if x not in countries])]
df = pd.DataFrame({'countries':['US','UK','Germany','China']})
countries = ['UK','China']
implement in:
df[df.countries.isin(countries)]
implement not in as in of rest countries:
df[df.countries.isin([x for x in np.unique(df.countries) if x not in countries])]
answered Apr 4 '18 at 11:51
Ioannis NasiosIoannis Nasios
3,6713832
3,6713832
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f19960077%2fhow-to-implement-in-and-not-in-for-pandas-dataframe%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
I think your solution is the best solution. Yours can cover IN, NOT_IN of multiple columns.
– Bruce Jung
Mar 17 '15 at 1:55
Do you want to test on single column or multiple columns?
– smci
Jul 17 '15 at 20:26
Related (performance / pandas internals): Pandas pd.Series.isin performance with set versus array
– jpp
Jun 28 '18 at 0:06