Pandas groupby multiindex when unique on first level: unexpected results
Python version: 3.5.2; Pandas version: 0.23.1
I am noticing unexpected behavior when I groupby using two indices but each row is unique on the first index. The code I am executing on my data frame with column c is:
df.c.groupby(df.index.names).min()
Everything works as expected when the rows are not unique on the first index. To make this clear, I've included two versions below. Edit: Now including three versions!
Version 1: Has the expected output
df = pd.DataFrame([[1, 2, 3], [4, 5, 6], [1, 2, 4]], columns=['a', 'b', 'c'])
df = df.set_index(['a','b']).sort_index()
Input:
c
a b
1 2 3
2 4
4 5 6
Output:
a b
1 2 3
4 5 6
Version 2: Has the unexpected output
df = pd.DataFrame([[1, 2, 3], [4, 5, 6]], columns=['a', 'b', 'c'])
df = df.set_index(['a','b']).sort_index()
Input:
c
a b
1 2 3
4 5 6
Output:
a 3
b 6
Expected Output:
a b
1 2 3
4 5 6
Version 3: Has expected output, but not expected with version 2 in mind.
df = pd.DataFrame([[1, 2, 3, 4], [4, 5, 6, 7]], columns=['a', 'b1', 'b2', 'c'])
df = df.set_index(['a','b1','b2']).sort_index()
Input:
c
a b1 b2
1 2 3 4
4 5 6 7
Output:
a b1 b2
1 2 3 4
4 5 6 7
python pandas pandas-groupby
add a comment |
Python version: 3.5.2; Pandas version: 0.23.1
I am noticing unexpected behavior when I groupby using two indices but each row is unique on the first index. The code I am executing on my data frame with column c is:
df.c.groupby(df.index.names).min()
Everything works as expected when the rows are not unique on the first index. To make this clear, I've included two versions below. Edit: Now including three versions!
Version 1: Has the expected output
df = pd.DataFrame([[1, 2, 3], [4, 5, 6], [1, 2, 4]], columns=['a', 'b', 'c'])
df = df.set_index(['a','b']).sort_index()
Input:
c
a b
1 2 3
2 4
4 5 6
Output:
a b
1 2 3
4 5 6
Version 2: Has the unexpected output
df = pd.DataFrame([[1, 2, 3], [4, 5, 6]], columns=['a', 'b', 'c'])
df = df.set_index(['a','b']).sort_index()
Input:
c
a b
1 2 3
4 5 6
Output:
a 3
b 6
Expected Output:
a b
1 2 3
4 5 6
Version 3: Has expected output, but not expected with version 2 in mind.
df = pd.DataFrame([[1, 2, 3, 4], [4, 5, 6, 7]], columns=['a', 'b1', 'b2', 'c'])
df = df.set_index(['a','b1','b2']).sort_index()
Input:
c
a b1 b2
1 2 3 4
4 5 6 7
Output:
a b1 b2
1 2 3 4
4 5 6 7
python pandas pandas-groupby
add a comment |
Python version: 3.5.2; Pandas version: 0.23.1
I am noticing unexpected behavior when I groupby using two indices but each row is unique on the first index. The code I am executing on my data frame with column c is:
df.c.groupby(df.index.names).min()
Everything works as expected when the rows are not unique on the first index. To make this clear, I've included two versions below. Edit: Now including three versions!
Version 1: Has the expected output
df = pd.DataFrame([[1, 2, 3], [4, 5, 6], [1, 2, 4]], columns=['a', 'b', 'c'])
df = df.set_index(['a','b']).sort_index()
Input:
c
a b
1 2 3
2 4
4 5 6
Output:
a b
1 2 3
4 5 6
Version 2: Has the unexpected output
df = pd.DataFrame([[1, 2, 3], [4, 5, 6]], columns=['a', 'b', 'c'])
df = df.set_index(['a','b']).sort_index()
Input:
c
a b
1 2 3
4 5 6
Output:
a 3
b 6
Expected Output:
a b
1 2 3
4 5 6
Version 3: Has expected output, but not expected with version 2 in mind.
df = pd.DataFrame([[1, 2, 3, 4], [4, 5, 6, 7]], columns=['a', 'b1', 'b2', 'c'])
df = df.set_index(['a','b1','b2']).sort_index()
Input:
c
a b1 b2
1 2 3 4
4 5 6 7
Output:
a b1 b2
1 2 3 4
4 5 6 7
python pandas pandas-groupby
Python version: 3.5.2; Pandas version: 0.23.1
I am noticing unexpected behavior when I groupby using two indices but each row is unique on the first index. The code I am executing on my data frame with column c is:
df.c.groupby(df.index.names).min()
Everything works as expected when the rows are not unique on the first index. To make this clear, I've included two versions below. Edit: Now including three versions!
Version 1: Has the expected output
df = pd.DataFrame([[1, 2, 3], [4, 5, 6], [1, 2, 4]], columns=['a', 'b', 'c'])
df = df.set_index(['a','b']).sort_index()
Input:
c
a b
1 2 3
2 4
4 5 6
Output:
a b
1 2 3
4 5 6
Version 2: Has the unexpected output
df = pd.DataFrame([[1, 2, 3], [4, 5, 6]], columns=['a', 'b', 'c'])
df = df.set_index(['a','b']).sort_index()
Input:
c
a b
1 2 3
4 5 6
Output:
a 3
b 6
Expected Output:
a b
1 2 3
4 5 6
Version 3: Has expected output, but not expected with version 2 in mind.
df = pd.DataFrame([[1, 2, 3, 4], [4, 5, 6, 7]], columns=['a', 'b1', 'b2', 'c'])
df = df.set_index(['a','b1','b2']).sort_index()
Input:
c
a b1 b2
1 2 3 4
4 5 6 7
Output:
a b1 b2
1 2 3 4
4 5 6 7
python pandas pandas-groupby
python pandas pandas-groupby
edited Nov 19 '18 at 20:46
Richard Shadrach
asked Nov 19 '18 at 20:20


Richard ShadrachRichard Shadrach
112
112
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
Here is a peek in to what is going on. Take a look at the name of the series that gets getting passed into the "applied" function, f.
In the first case (Expected Results):
df = pd.DataFrame([[1, 2, 3], [4, 5, 6], [1, 2, 4]], columns=['a', 'b', 'c'])
df = df.set_index(['a','b']).sort_index()
def f(x):
print(x)
print('n')
print(min(x))
print('n')
return min(x)
df.c.groupby(['a','b']).apply(f)
Output:
a b
1 2 3
2 4
Name: (1, 2), dtype: int64
3
a b
4 5 6
Name: (4, 5), dtype: int64
6
Out[292]:
a b
1 2 3
4 5 6
In the second case (unexpected results), note the name of the series passed in:
df1 = pd.DataFrame([[1, 2, 3], [4, 5, 6]], columns=['a', 'b', 'c'])
df1 = df1.set_index(['a','b']).sort_index()
def f(x):
print(x)
print('n')
print(min(x))
print('n')
return min(x)
df1.c.groupby(['a','b']).apply(f)
Output:
a b
1 2 3
Name: a, dtype: int64
3
a b
4 5 6
Name: b, dtype: int64
6
Out[293]:
a 3
b 6
Name: c, dtype: int64
It uses these series to build the resulting dataframe. The naming of the series is the culprit due the nature of the data. Why? Well, we'll have to look into the code for that.
The idiomatic fix for this problem is use this syntax:
df1.groupby(df1.index.names)['c'].min()
Output:
a b
1 2 3
4 5 6
Name: c, dtype: int64
Thanks for your answer. I think it may be helpful to include the results of my now added Version 3 here. With Version 3 in mind, I believe something very odd is going on here.
– Richard Shadrach
Nov 19 '18 at 21:02
@RichardShadrach Nothing really strange is going on here. Use the recommended syntax and you'll get the results you're expecting.df3.groupby(df3.index.names)['c'].min()
– Scott Boston
Nov 19 '18 at 21:20
Is the syntax I was using not recommended? Or is it to do something other than what I intended? I find the difference in the result when I simply add a third level to the index mystifying and dangerous when code depends on the result having a certain index/values.
– RghtHndSd
Nov 19 '18 at 21:26
add a comment |
You can use the level
argument of groupby:
>>> df
c
a b
1 2 3
4 5 6
>>> df.c.groupby(level=[0,1]).min()
a b
1 2 3
4 5 6
Name: c, dtype: int64
From the docs
level : int, level name, or sequence of such, default None
If the axis is a MultiIndex (hierarchical), group by a particular level or levels
Indeed this gives the expected result (as does changing it to df.groupby(df.index.names).c.min()). However is the output I showed expected? Is there an explanation for it?
– Richard Shadrach
Nov 19 '18 at 20:28
You can see the difference between the two strategies by looking at the list of groups when you groupby using each different method:df.c.groupby(df.index.names).groups.keys()
groups bya
andb
, whereasdf.c.groupby(level=[0,1]).groups.keys()
groups by the values in the index, i.e. by(1,2)
and(4,5)
– sacul
Nov 19 '18 at 20:33
1
Want more inconsistent behavior? Try selecting as aDataFrame
withdf[['c']].groupby(df.index.names).min()
, which will provide the desired output. I've run into this before, I have to believe it's not intended behavior.
– user3483203
Nov 19 '18 at 20:36
@sacul: Sorry if I'm being dense here, but is getting back the keys ['a', 'b'] from your first line of code in your comment really something I should expect?
– Richard Shadrach
Nov 19 '18 at 20:51
Well, those are the names of the indices, so I guess that yes, I would expect that, but to be honest, I'm not sure why you don't get that for your first example dataframe... Perhaps it's a bug, or I'm missing something somehwere
– sacul
Nov 19 '18 at 20:53
|
show 1 more comment
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53382065%2fpandas-groupby-multiindex-when-unique-on-first-level-unexpected-results%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
Here is a peek in to what is going on. Take a look at the name of the series that gets getting passed into the "applied" function, f.
In the first case (Expected Results):
df = pd.DataFrame([[1, 2, 3], [4, 5, 6], [1, 2, 4]], columns=['a', 'b', 'c'])
df = df.set_index(['a','b']).sort_index()
def f(x):
print(x)
print('n')
print(min(x))
print('n')
return min(x)
df.c.groupby(['a','b']).apply(f)
Output:
a b
1 2 3
2 4
Name: (1, 2), dtype: int64
3
a b
4 5 6
Name: (4, 5), dtype: int64
6
Out[292]:
a b
1 2 3
4 5 6
In the second case (unexpected results), note the name of the series passed in:
df1 = pd.DataFrame([[1, 2, 3], [4, 5, 6]], columns=['a', 'b', 'c'])
df1 = df1.set_index(['a','b']).sort_index()
def f(x):
print(x)
print('n')
print(min(x))
print('n')
return min(x)
df1.c.groupby(['a','b']).apply(f)
Output:
a b
1 2 3
Name: a, dtype: int64
3
a b
4 5 6
Name: b, dtype: int64
6
Out[293]:
a 3
b 6
Name: c, dtype: int64
It uses these series to build the resulting dataframe. The naming of the series is the culprit due the nature of the data. Why? Well, we'll have to look into the code for that.
The idiomatic fix for this problem is use this syntax:
df1.groupby(df1.index.names)['c'].min()
Output:
a b
1 2 3
4 5 6
Name: c, dtype: int64
Thanks for your answer. I think it may be helpful to include the results of my now added Version 3 here. With Version 3 in mind, I believe something very odd is going on here.
– Richard Shadrach
Nov 19 '18 at 21:02
@RichardShadrach Nothing really strange is going on here. Use the recommended syntax and you'll get the results you're expecting.df3.groupby(df3.index.names)['c'].min()
– Scott Boston
Nov 19 '18 at 21:20
Is the syntax I was using not recommended? Or is it to do something other than what I intended? I find the difference in the result when I simply add a third level to the index mystifying and dangerous when code depends on the result having a certain index/values.
– RghtHndSd
Nov 19 '18 at 21:26
add a comment |
Here is a peek in to what is going on. Take a look at the name of the series that gets getting passed into the "applied" function, f.
In the first case (Expected Results):
df = pd.DataFrame([[1, 2, 3], [4, 5, 6], [1, 2, 4]], columns=['a', 'b', 'c'])
df = df.set_index(['a','b']).sort_index()
def f(x):
print(x)
print('n')
print(min(x))
print('n')
return min(x)
df.c.groupby(['a','b']).apply(f)
Output:
a b
1 2 3
2 4
Name: (1, 2), dtype: int64
3
a b
4 5 6
Name: (4, 5), dtype: int64
6
Out[292]:
a b
1 2 3
4 5 6
In the second case (unexpected results), note the name of the series passed in:
df1 = pd.DataFrame([[1, 2, 3], [4, 5, 6]], columns=['a', 'b', 'c'])
df1 = df1.set_index(['a','b']).sort_index()
def f(x):
print(x)
print('n')
print(min(x))
print('n')
return min(x)
df1.c.groupby(['a','b']).apply(f)
Output:
a b
1 2 3
Name: a, dtype: int64
3
a b
4 5 6
Name: b, dtype: int64
6
Out[293]:
a 3
b 6
Name: c, dtype: int64
It uses these series to build the resulting dataframe. The naming of the series is the culprit due the nature of the data. Why? Well, we'll have to look into the code for that.
The idiomatic fix for this problem is use this syntax:
df1.groupby(df1.index.names)['c'].min()
Output:
a b
1 2 3
4 5 6
Name: c, dtype: int64
Thanks for your answer. I think it may be helpful to include the results of my now added Version 3 here. With Version 3 in mind, I believe something very odd is going on here.
– Richard Shadrach
Nov 19 '18 at 21:02
@RichardShadrach Nothing really strange is going on here. Use the recommended syntax and you'll get the results you're expecting.df3.groupby(df3.index.names)['c'].min()
– Scott Boston
Nov 19 '18 at 21:20
Is the syntax I was using not recommended? Or is it to do something other than what I intended? I find the difference in the result when I simply add a third level to the index mystifying and dangerous when code depends on the result having a certain index/values.
– RghtHndSd
Nov 19 '18 at 21:26
add a comment |
Here is a peek in to what is going on. Take a look at the name of the series that gets getting passed into the "applied" function, f.
In the first case (Expected Results):
df = pd.DataFrame([[1, 2, 3], [4, 5, 6], [1, 2, 4]], columns=['a', 'b', 'c'])
df = df.set_index(['a','b']).sort_index()
def f(x):
print(x)
print('n')
print(min(x))
print('n')
return min(x)
df.c.groupby(['a','b']).apply(f)
Output:
a b
1 2 3
2 4
Name: (1, 2), dtype: int64
3
a b
4 5 6
Name: (4, 5), dtype: int64
6
Out[292]:
a b
1 2 3
4 5 6
In the second case (unexpected results), note the name of the series passed in:
df1 = pd.DataFrame([[1, 2, 3], [4, 5, 6]], columns=['a', 'b', 'c'])
df1 = df1.set_index(['a','b']).sort_index()
def f(x):
print(x)
print('n')
print(min(x))
print('n')
return min(x)
df1.c.groupby(['a','b']).apply(f)
Output:
a b
1 2 3
Name: a, dtype: int64
3
a b
4 5 6
Name: b, dtype: int64
6
Out[293]:
a 3
b 6
Name: c, dtype: int64
It uses these series to build the resulting dataframe. The naming of the series is the culprit due the nature of the data. Why? Well, we'll have to look into the code for that.
The idiomatic fix for this problem is use this syntax:
df1.groupby(df1.index.names)['c'].min()
Output:
a b
1 2 3
4 5 6
Name: c, dtype: int64
Here is a peek in to what is going on. Take a look at the name of the series that gets getting passed into the "applied" function, f.
In the first case (Expected Results):
df = pd.DataFrame([[1, 2, 3], [4, 5, 6], [1, 2, 4]], columns=['a', 'b', 'c'])
df = df.set_index(['a','b']).sort_index()
def f(x):
print(x)
print('n')
print(min(x))
print('n')
return min(x)
df.c.groupby(['a','b']).apply(f)
Output:
a b
1 2 3
2 4
Name: (1, 2), dtype: int64
3
a b
4 5 6
Name: (4, 5), dtype: int64
6
Out[292]:
a b
1 2 3
4 5 6
In the second case (unexpected results), note the name of the series passed in:
df1 = pd.DataFrame([[1, 2, 3], [4, 5, 6]], columns=['a', 'b', 'c'])
df1 = df1.set_index(['a','b']).sort_index()
def f(x):
print(x)
print('n')
print(min(x))
print('n')
return min(x)
df1.c.groupby(['a','b']).apply(f)
Output:
a b
1 2 3
Name: a, dtype: int64
3
a b
4 5 6
Name: b, dtype: int64
6
Out[293]:
a 3
b 6
Name: c, dtype: int64
It uses these series to build the resulting dataframe. The naming of the series is the culprit due the nature of the data. Why? Well, we'll have to look into the code for that.
The idiomatic fix for this problem is use this syntax:
df1.groupby(df1.index.names)['c'].min()
Output:
a b
1 2 3
4 5 6
Name: c, dtype: int64
edited Nov 19 '18 at 20:45
answered Nov 19 '18 at 20:40


Scott BostonScott Boston
52.4k72955
52.4k72955
Thanks for your answer. I think it may be helpful to include the results of my now added Version 3 here. With Version 3 in mind, I believe something very odd is going on here.
– Richard Shadrach
Nov 19 '18 at 21:02
@RichardShadrach Nothing really strange is going on here. Use the recommended syntax and you'll get the results you're expecting.df3.groupby(df3.index.names)['c'].min()
– Scott Boston
Nov 19 '18 at 21:20
Is the syntax I was using not recommended? Or is it to do something other than what I intended? I find the difference in the result when I simply add a third level to the index mystifying and dangerous when code depends on the result having a certain index/values.
– RghtHndSd
Nov 19 '18 at 21:26
add a comment |
Thanks for your answer. I think it may be helpful to include the results of my now added Version 3 here. With Version 3 in mind, I believe something very odd is going on here.
– Richard Shadrach
Nov 19 '18 at 21:02
@RichardShadrach Nothing really strange is going on here. Use the recommended syntax and you'll get the results you're expecting.df3.groupby(df3.index.names)['c'].min()
– Scott Boston
Nov 19 '18 at 21:20
Is the syntax I was using not recommended? Or is it to do something other than what I intended? I find the difference in the result when I simply add a third level to the index mystifying and dangerous when code depends on the result having a certain index/values.
– RghtHndSd
Nov 19 '18 at 21:26
Thanks for your answer. I think it may be helpful to include the results of my now added Version 3 here. With Version 3 in mind, I believe something very odd is going on here.
– Richard Shadrach
Nov 19 '18 at 21:02
Thanks for your answer. I think it may be helpful to include the results of my now added Version 3 here. With Version 3 in mind, I believe something very odd is going on here.
– Richard Shadrach
Nov 19 '18 at 21:02
@RichardShadrach Nothing really strange is going on here. Use the recommended syntax and you'll get the results you're expecting.
df3.groupby(df3.index.names)['c'].min()
– Scott Boston
Nov 19 '18 at 21:20
@RichardShadrach Nothing really strange is going on here. Use the recommended syntax and you'll get the results you're expecting.
df3.groupby(df3.index.names)['c'].min()
– Scott Boston
Nov 19 '18 at 21:20
Is the syntax I was using not recommended? Or is it to do something other than what I intended? I find the difference in the result when I simply add a third level to the index mystifying and dangerous when code depends on the result having a certain index/values.
– RghtHndSd
Nov 19 '18 at 21:26
Is the syntax I was using not recommended? Or is it to do something other than what I intended? I find the difference in the result when I simply add a third level to the index mystifying and dangerous when code depends on the result having a certain index/values.
– RghtHndSd
Nov 19 '18 at 21:26
add a comment |
You can use the level
argument of groupby:
>>> df
c
a b
1 2 3
4 5 6
>>> df.c.groupby(level=[0,1]).min()
a b
1 2 3
4 5 6
Name: c, dtype: int64
From the docs
level : int, level name, or sequence of such, default None
If the axis is a MultiIndex (hierarchical), group by a particular level or levels
Indeed this gives the expected result (as does changing it to df.groupby(df.index.names).c.min()). However is the output I showed expected? Is there an explanation for it?
– Richard Shadrach
Nov 19 '18 at 20:28
You can see the difference between the two strategies by looking at the list of groups when you groupby using each different method:df.c.groupby(df.index.names).groups.keys()
groups bya
andb
, whereasdf.c.groupby(level=[0,1]).groups.keys()
groups by the values in the index, i.e. by(1,2)
and(4,5)
– sacul
Nov 19 '18 at 20:33
1
Want more inconsistent behavior? Try selecting as aDataFrame
withdf[['c']].groupby(df.index.names).min()
, which will provide the desired output. I've run into this before, I have to believe it's not intended behavior.
– user3483203
Nov 19 '18 at 20:36
@sacul: Sorry if I'm being dense here, but is getting back the keys ['a', 'b'] from your first line of code in your comment really something I should expect?
– Richard Shadrach
Nov 19 '18 at 20:51
Well, those are the names of the indices, so I guess that yes, I would expect that, but to be honest, I'm not sure why you don't get that for your first example dataframe... Perhaps it's a bug, or I'm missing something somehwere
– sacul
Nov 19 '18 at 20:53
|
show 1 more comment
You can use the level
argument of groupby:
>>> df
c
a b
1 2 3
4 5 6
>>> df.c.groupby(level=[0,1]).min()
a b
1 2 3
4 5 6
Name: c, dtype: int64
From the docs
level : int, level name, or sequence of such, default None
If the axis is a MultiIndex (hierarchical), group by a particular level or levels
Indeed this gives the expected result (as does changing it to df.groupby(df.index.names).c.min()). However is the output I showed expected? Is there an explanation for it?
– Richard Shadrach
Nov 19 '18 at 20:28
You can see the difference between the two strategies by looking at the list of groups when you groupby using each different method:df.c.groupby(df.index.names).groups.keys()
groups bya
andb
, whereasdf.c.groupby(level=[0,1]).groups.keys()
groups by the values in the index, i.e. by(1,2)
and(4,5)
– sacul
Nov 19 '18 at 20:33
1
Want more inconsistent behavior? Try selecting as aDataFrame
withdf[['c']].groupby(df.index.names).min()
, which will provide the desired output. I've run into this before, I have to believe it's not intended behavior.
– user3483203
Nov 19 '18 at 20:36
@sacul: Sorry if I'm being dense here, but is getting back the keys ['a', 'b'] from your first line of code in your comment really something I should expect?
– Richard Shadrach
Nov 19 '18 at 20:51
Well, those are the names of the indices, so I guess that yes, I would expect that, but to be honest, I'm not sure why you don't get that for your first example dataframe... Perhaps it's a bug, or I'm missing something somehwere
– sacul
Nov 19 '18 at 20:53
|
show 1 more comment
You can use the level
argument of groupby:
>>> df
c
a b
1 2 3
4 5 6
>>> df.c.groupby(level=[0,1]).min()
a b
1 2 3
4 5 6
Name: c, dtype: int64
From the docs
level : int, level name, or sequence of such, default None
If the axis is a MultiIndex (hierarchical), group by a particular level or levels
You can use the level
argument of groupby:
>>> df
c
a b
1 2 3
4 5 6
>>> df.c.groupby(level=[0,1]).min()
a b
1 2 3
4 5 6
Name: c, dtype: int64
From the docs
level : int, level name, or sequence of such, default None
If the axis is a MultiIndex (hierarchical), group by a particular level or levels
answered Nov 19 '18 at 20:25


saculsacul
30k41740
30k41740
Indeed this gives the expected result (as does changing it to df.groupby(df.index.names).c.min()). However is the output I showed expected? Is there an explanation for it?
– Richard Shadrach
Nov 19 '18 at 20:28
You can see the difference between the two strategies by looking at the list of groups when you groupby using each different method:df.c.groupby(df.index.names).groups.keys()
groups bya
andb
, whereasdf.c.groupby(level=[0,1]).groups.keys()
groups by the values in the index, i.e. by(1,2)
and(4,5)
– sacul
Nov 19 '18 at 20:33
1
Want more inconsistent behavior? Try selecting as aDataFrame
withdf[['c']].groupby(df.index.names).min()
, which will provide the desired output. I've run into this before, I have to believe it's not intended behavior.
– user3483203
Nov 19 '18 at 20:36
@sacul: Sorry if I'm being dense here, but is getting back the keys ['a', 'b'] from your first line of code in your comment really something I should expect?
– Richard Shadrach
Nov 19 '18 at 20:51
Well, those are the names of the indices, so I guess that yes, I would expect that, but to be honest, I'm not sure why you don't get that for your first example dataframe... Perhaps it's a bug, or I'm missing something somehwere
– sacul
Nov 19 '18 at 20:53
|
show 1 more comment
Indeed this gives the expected result (as does changing it to df.groupby(df.index.names).c.min()). However is the output I showed expected? Is there an explanation for it?
– Richard Shadrach
Nov 19 '18 at 20:28
You can see the difference between the two strategies by looking at the list of groups when you groupby using each different method:df.c.groupby(df.index.names).groups.keys()
groups bya
andb
, whereasdf.c.groupby(level=[0,1]).groups.keys()
groups by the values in the index, i.e. by(1,2)
and(4,5)
– sacul
Nov 19 '18 at 20:33
1
Want more inconsistent behavior? Try selecting as aDataFrame
withdf[['c']].groupby(df.index.names).min()
, which will provide the desired output. I've run into this before, I have to believe it's not intended behavior.
– user3483203
Nov 19 '18 at 20:36
@sacul: Sorry if I'm being dense here, but is getting back the keys ['a', 'b'] from your first line of code in your comment really something I should expect?
– Richard Shadrach
Nov 19 '18 at 20:51
Well, those are the names of the indices, so I guess that yes, I would expect that, but to be honest, I'm not sure why you don't get that for your first example dataframe... Perhaps it's a bug, or I'm missing something somehwere
– sacul
Nov 19 '18 at 20:53
Indeed this gives the expected result (as does changing it to df.groupby(df.index.names).c.min()). However is the output I showed expected? Is there an explanation for it?
– Richard Shadrach
Nov 19 '18 at 20:28
Indeed this gives the expected result (as does changing it to df.groupby(df.index.names).c.min()). However is the output I showed expected? Is there an explanation for it?
– Richard Shadrach
Nov 19 '18 at 20:28
You can see the difference between the two strategies by looking at the list of groups when you groupby using each different method:
df.c.groupby(df.index.names).groups.keys()
groups by a
and b
, whereas df.c.groupby(level=[0,1]).groups.keys()
groups by the values in the index, i.e. by (1,2)
and (4,5)
– sacul
Nov 19 '18 at 20:33
You can see the difference between the two strategies by looking at the list of groups when you groupby using each different method:
df.c.groupby(df.index.names).groups.keys()
groups by a
and b
, whereas df.c.groupby(level=[0,1]).groups.keys()
groups by the values in the index, i.e. by (1,2)
and (4,5)
– sacul
Nov 19 '18 at 20:33
1
1
Want more inconsistent behavior? Try selecting as a
DataFrame
with df[['c']].groupby(df.index.names).min()
, which will provide the desired output. I've run into this before, I have to believe it's not intended behavior.– user3483203
Nov 19 '18 at 20:36
Want more inconsistent behavior? Try selecting as a
DataFrame
with df[['c']].groupby(df.index.names).min()
, which will provide the desired output. I've run into this before, I have to believe it's not intended behavior.– user3483203
Nov 19 '18 at 20:36
@sacul: Sorry if I'm being dense here, but is getting back the keys ['a', 'b'] from your first line of code in your comment really something I should expect?
– Richard Shadrach
Nov 19 '18 at 20:51
@sacul: Sorry if I'm being dense here, but is getting back the keys ['a', 'b'] from your first line of code in your comment really something I should expect?
– Richard Shadrach
Nov 19 '18 at 20:51
Well, those are the names of the indices, so I guess that yes, I would expect that, but to be honest, I'm not sure why you don't get that for your first example dataframe... Perhaps it's a bug, or I'm missing something somehwere
– sacul
Nov 19 '18 at 20:53
Well, those are the names of the indices, so I guess that yes, I would expect that, but to be honest, I'm not sure why you don't get that for your first example dataframe... Perhaps it's a bug, or I'm missing something somehwere
– sacul
Nov 19 '18 at 20:53
|
show 1 more comment
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53382065%2fpandas-groupby-multiindex-when-unique-on-first-level-unexpected-results%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown