Pandas groupby multiindex when unique on first level: unexpected results

Python version: 3.5.2; Pandas version: 0.23.1

I am noticing unexpected behavior when I groupby using two indices but each row is unique on the first index. The code I am executing on my data frame with column c is:

df.c.groupby(df.index.names).min()

Everything works as expected when the rows are not unique on the first index. To make this clear, I've included two versions below. Edit: Now including three versions!

Version 1: Has the expected output

df = pd.DataFrame([[1, 2, 3], [4, 5, 6], [1, 2, 4]], columns=['a', 'b', 'c'])

df = df.set_index(['a','b']).sort_index()

Input:

Output:

Version 2: Has the unexpected output

df = pd.DataFrame([[1, 2, 3], [4, 5, 6]], columns=['a', 'b', 'c'])

df = df.set_index(['a','b']).sort_index()

Input:

Output:

a    3

b    6

Expected Output:

Version 3: Has expected output, but not expected with version 2 in mind.

df = pd.DataFrame([[1, 2, 3, 4], [4, 5, 6, 7]], columns=['a', 'b1', 'b2', 'c'])

df = df.set_index(['a','b1','b2']).sort_index()

Input:

Output:

a  b1  b2

1  2   3     4

4  5   6     7

edited Nov 19 '18 at 20:46

asked Nov 19 '18 at 20:20

Richard Shadrach

112

add a comment |

Python version: 3.5.2; Pandas version: 0.23.1

I am noticing unexpected behavior when I groupby using two indices but each row is unique on the first index. The code I am executing on my data frame with column c is:

df.c.groupby(df.index.names).min()

Everything works as expected when the rows are not unique on the first index. To make this clear, I've included two versions below. Edit: Now including three versions!

Version 1: Has the expected output

df = pd.DataFrame([[1, 2, 3], [4, 5, 6], [1, 2, 4]], columns=['a', 'b', 'c'])

df = df.set_index(['a','b']).sort_index()

Input:

Output:

Version 2: Has the unexpected output

df = pd.DataFrame([[1, 2, 3], [4, 5, 6]], columns=['a', 'b', 'c'])

df = df.set_index(['a','b']).sort_index()

Input:

Output:

a    3

b    6

Expected Output:

Version 3: Has expected output, but not expected with version 2 in mind.

df = pd.DataFrame([[1, 2, 3, 4], [4, 5, 6, 7]], columns=['a', 'b1', 'b2', 'c'])

df = df.set_index(['a','b1','b2']).sort_index()

Input:

Output:

a  b1  b2

1  2   3     4

4  5   6     7

edited Nov 19 '18 at 20:46

asked Nov 19 '18 at 20:20

Richard Shadrach

112

add a comment |

Python version: 3.5.2; Pandas version: 0.23.1

I am noticing unexpected behavior when I groupby using two indices but each row is unique on the first index. The code I am executing on my data frame with column c is:

df.c.groupby(df.index.names).min()

Everything works as expected when the rows are not unique on the first index. To make this clear, I've included two versions below. Edit: Now including three versions!

Version 1: Has the expected output

df = pd.DataFrame([[1, 2, 3], [4, 5, 6], [1, 2, 4]], columns=['a', 'b', 'c'])

df = df.set_index(['a','b']).sort_index()

Input:

Output:

Version 2: Has the unexpected output

df = pd.DataFrame([[1, 2, 3], [4, 5, 6]], columns=['a', 'b', 'c'])

df = df.set_index(['a','b']).sort_index()

Input:

Output:

a    3

b    6

Expected Output:

Version 3: Has expected output, but not expected with version 2 in mind.

df = pd.DataFrame([[1, 2, 3, 4], [4, 5, 6, 7]], columns=['a', 'b1', 'b2', 'c'])

df = df.set_index(['a','b1','b2']).sort_index()

Input:

Output:

a  b1  b2

1  2   3     4

4  5   6     7

edited Nov 19 '18 at 20:46

asked Nov 19 '18 at 20:20

Richard Shadrach

112

Python version: 3.5.2; Pandas version: 0.23.1

I am noticing unexpected behavior when I groupby using two indices but each row is unique on the first index. The code I am executing on my data frame with column c is:

df.c.groupby(df.index.names).min()

Everything works as expected when the rows are not unique on the first index. To make this clear, I've included two versions below. Edit: Now including three versions!

Version 1: Has the expected output

df = pd.DataFrame([[1, 2, 3], [4, 5, 6], [1, 2, 4]], columns=['a', 'b', 'c'])

df = df.set_index(['a','b']).sort_index()

Input:

Output:

Version 2: Has the unexpected output

df = pd.DataFrame([[1, 2, 3], [4, 5, 6]], columns=['a', 'b', 'c'])

df = df.set_index(['a','b']).sort_index()

Input:

Output:

a    3

b    6

Expected Output:

Version 3: Has expected output, but not expected with version 2 in mind.

df = pd.DataFrame([[1, 2, 3, 4], [4, 5, 6, 7]], columns=['a', 'b1', 'b2', 'c'])

df = df.set_index(['a','b1','b2']).sort_index()

Input:

Output:

a  b1  b2

1  2   3     4

4  5   6     7

python pandas pandas-groupby

edited Nov 19 '18 at 20:46

asked Nov 19 '18 at 20:20

Richard Shadrach

112

edited Nov 19 '18 at 20:46

asked Nov 19 '18 at 20:20

Richard Shadrach

112

edited Nov 19 '18 at 20:46

asked Nov 19 '18 at 20:20

Richard Shadrach

112

asked Nov 19 '18 at 20:20

Richard Shadrach

112

asked Nov 19 '18 at 20:20

Richard Shadrach

112

add a comment |

2 Answers
2

active

oldest

votes

Here is a peek in to what is going on. Take a look at the name of the series that gets getting passed into the "applied" function, f.

In the first case (Expected Results):

df = pd.DataFrame([[1, 2, 3], [4, 5, 6], [1, 2, 4]], columns=['a', 'b', 'c'])

df = df.set_index(['a','b']).sort_index()



def f(x):

    print(x)

    print('n')

    print(min(x))

    print('n')

    return min(x)

df.c.groupby(['a','b']).apply(f)

Output:

a  b

1  2    3

   2    4

Name: (1, 2), dtype: int64





3





a  b

4  5    6

Name: (4, 5), dtype: int64





6





Out[292]:



a  b

1  2    3

4  5    6

In the second case (unexpected results), note the name of the series passed in:

df1 = pd.DataFrame([[1, 2, 3], [4, 5, 6]], columns=['a', 'b', 'c'])

df1 = df1.set_index(['a','b']).sort_index()

def f(x):

    print(x)

    print('n')

    print(min(x))

    print('n')

    return min(x)

df1.c.groupby(['a','b']).apply(f)

Output:

a  b

1  2    3

Name: a, dtype: int64





3





a  b

4  5    6

Name: b, dtype: int64





6





Out[293]:



a    3

b    6

Name: c, dtype: int64

It uses these series to build the resulting dataframe. The naming of the series is the culprit due the nature of the data. Why? Well, we'll have to look into the code for that.

The idiomatic fix for this problem is use this syntax:

df1.groupby(df1.index.names)['c'].min()

Output:

a  b

1  2    3

4  5    6

Name: c, dtype: int64

edited Nov 19 '18 at 20:45

answered Nov 19 '18 at 20:40

Scott Boston

52.4k72955

Thanks for your answer. I think it may be helpful to include the results of my now added Version 3 here. With Version 3 in mind, I believe something very odd is going on here.
– Richard Shadrach
Nov 19 '18 at 21:02

@RichardShadrach Nothing really strange is going on here. Use the recommended syntax and you'll get the results you're expecting. df3.groupby(df3.index.names)['c'].min()
– Scott Boston
Nov 19 '18 at 21:20

Is the syntax I was using not recommended? Or is it to do something other than what I intended? I find the difference in the result when I simply add a third level to the index mystifying and dangerous when code depends on the result having a certain index/values.
– RghtHndSd
Nov 19 '18 at 21:26

add a comment |

You can use the level argument of groupby:

>>> df

     c

a b   

1 2  3

4 5  6



>>> df.c.groupby(level=[0,1]).min()

a  b

1  2    3

4  5    6

Name: c, dtype: int64

From the docs

level : int, level name, or sequence of such, default None

If the axis is a MultiIndex (hierarchical), group by a particular level or levels

answered Nov 19 '18 at 20:25

sacul

30k41740

Indeed this gives the expected result (as does changing it to df.groupby(df.index.names).c.min()). However is the output I showed expected? Is there an explanation for it?
– Richard Shadrach
Nov 19 '18 at 20:28

You can see the difference between the two strategies by looking at the list of groups when you groupby using each different method: df.c.groupby(df.index.names).groups.keys() groups by a and b, whereas df.c.groupby(level=[0,1]).groups.keys() groups by the values in the index, i.e. by (1,2) and (4,5)
– sacul
Nov 19 '18 at 20:33

1

Want more inconsistent behavior? Try selecting as a DataFrame with df[['c']].groupby(df.index.names).min(), which will provide the desired output. I've run into this before, I have to believe it's not intended behavior.
– user3483203
Nov 19 '18 at 20:36

@sacul: Sorry if I'm being dense here, but is getting back the keys ['a', 'b'] from your first line of code in your comment really something I should expect?
– Richard Shadrach
Nov 19 '18 at 20:51

Well, those are the names of the indices, so I guess that yes, I would expect that, but to be honest, I'm not sure why you don't get that for your first example dataframe... Perhaps it's a bug, or I'm missing something somehwere
– sacul
Nov 19 '18 at 20:53

|
show 1 more comment

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53382065%2fpandas-groupby-multiindex-when-unique-on-first-level-unexpected-results%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

Here is a peek in to what is going on. Take a look at the name of the series that gets getting passed into the "applied" function, f.

In the first case (Expected Results):

df = pd.DataFrame([[1, 2, 3], [4, 5, 6], [1, 2, 4]], columns=['a', 'b', 'c'])

df = df.set_index(['a','b']).sort_index()



def f(x):

    print(x)

    print('n')

    print(min(x))

    print('n')

    return min(x)

df.c.groupby(['a','b']).apply(f)

Output:

a  b

1  2    3

   2    4

Name: (1, 2), dtype: int64





3





a  b

4  5    6

Name: (4, 5), dtype: int64





6





Out[292]:



a  b

1  2    3

4  5    6

In the second case (unexpected results), note the name of the series passed in:

df1 = pd.DataFrame([[1, 2, 3], [4, 5, 6]], columns=['a', 'b', 'c'])

df1 = df1.set_index(['a','b']).sort_index()

def f(x):

    print(x)

    print('n')

    print(min(x))

    print('n')

    return min(x)

df1.c.groupby(['a','b']).apply(f)

Output:

a  b

1  2    3

Name: a, dtype: int64





3





a  b

4  5    6

Name: b, dtype: int64





6





Out[293]:



a    3

b    6

Name: c, dtype: int64

It uses these series to build the resulting dataframe. The naming of the series is the culprit due the nature of the data. Why? Well, we'll have to look into the code for that.

The idiomatic fix for this problem is use this syntax:

df1.groupby(df1.index.names)['c'].min()

Output:

a  b

1  2    3

4  5    6

Name: c, dtype: int64

edited Nov 19 '18 at 20:45

answered Nov 19 '18 at 20:40

Scott Boston

52.4k72955

Thanks for your answer. I think it may be helpful to include the results of my now added Version 3 here. With Version 3 in mind, I believe something very odd is going on here.
– Richard Shadrach
Nov 19 '18 at 21:02

@RichardShadrach Nothing really strange is going on here. Use the recommended syntax and you'll get the results you're expecting. df3.groupby(df3.index.names)['c'].min()
– Scott Boston
Nov 19 '18 at 21:20

Is the syntax I was using not recommended? Or is it to do something other than what I intended? I find the difference in the result when I simply add a third level to the index mystifying and dangerous when code depends on the result having a certain index/values.
– RghtHndSd
Nov 19 '18 at 21:26

add a comment |

Here is a peek in to what is going on. Take a look at the name of the series that gets getting passed into the "applied" function, f.

In the first case (Expected Results):

df = pd.DataFrame([[1, 2, 3], [4, 5, 6], [1, 2, 4]], columns=['a', 'b', 'c'])

df = df.set_index(['a','b']).sort_index()



def f(x):

    print(x)

    print('n')

    print(min(x))

    print('n')

    return min(x)

df.c.groupby(['a','b']).apply(f)

Output:

a  b

1  2    3

   2    4

Name: (1, 2), dtype: int64





3





a  b

4  5    6

Name: (4, 5), dtype: int64





6





Out[292]:



a  b

1  2    3

4  5    6

In the second case (unexpected results), note the name of the series passed in:

df1 = pd.DataFrame([[1, 2, 3], [4, 5, 6]], columns=['a', 'b', 'c'])

df1 = df1.set_index(['a','b']).sort_index()

def f(x):

    print(x)

    print('n')

    print(min(x))

    print('n')

    return min(x)

df1.c.groupby(['a','b']).apply(f)

Output:

a  b

1  2    3

Name: a, dtype: int64





3





a  b

4  5    6

Name: b, dtype: int64





6





Out[293]:



a    3

b    6

Name: c, dtype: int64

It uses these series to build the resulting dataframe. The naming of the series is the culprit due the nature of the data. Why? Well, we'll have to look into the code for that.

The idiomatic fix for this problem is use this syntax:

df1.groupby(df1.index.names)['c'].min()

Output:

a  b

1  2    3

4  5    6

Name: c, dtype: int64

edited Nov 19 '18 at 20:45

answered Nov 19 '18 at 20:40

Scott Boston

52.4k72955

Thanks for your answer. I think it may be helpful to include the results of my now added Version 3 here. With Version 3 in mind, I believe something very odd is going on here.
– Richard Shadrach
Nov 19 '18 at 21:02

@RichardShadrach Nothing really strange is going on here. Use the recommended syntax and you'll get the results you're expecting. df3.groupby(df3.index.names)['c'].min()
– Scott Boston
Nov 19 '18 at 21:20

Is the syntax I was using not recommended? Or is it to do something other than what I intended? I find the difference in the result when I simply add a third level to the index mystifying and dangerous when code depends on the result having a certain index/values.
– RghtHndSd
Nov 19 '18 at 21:26

add a comment |

Here is a peek in to what is going on. Take a look at the name of the series that gets getting passed into the "applied" function, f.

In the first case (Expected Results):

df = pd.DataFrame([[1, 2, 3], [4, 5, 6], [1, 2, 4]], columns=['a', 'b', 'c'])

df = df.set_index(['a','b']).sort_index()



def f(x):

    print(x)

    print('n')

    print(min(x))

    print('n')

    return min(x)

df.c.groupby(['a','b']).apply(f)

Output:

a  b

1  2    3

   2    4

Name: (1, 2), dtype: int64





3





a  b

4  5    6

Name: (4, 5), dtype: int64





6





Out[292]:



a  b

1  2    3

4  5    6

In the second case (unexpected results), note the name of the series passed in:

df1 = pd.DataFrame([[1, 2, 3], [4, 5, 6]], columns=['a', 'b', 'c'])

df1 = df1.set_index(['a','b']).sort_index()

def f(x):

    print(x)

    print('n')

    print(min(x))

    print('n')

    return min(x)

df1.c.groupby(['a','b']).apply(f)

Output:

a  b

1  2    3

Name: a, dtype: int64





3





a  b

4  5    6

Name: b, dtype: int64





6





Out[293]:



a    3

b    6

Name: c, dtype: int64

It uses these series to build the resulting dataframe. The naming of the series is the culprit due the nature of the data. Why? Well, we'll have to look into the code for that.

The idiomatic fix for this problem is use this syntax:

df1.groupby(df1.index.names)['c'].min()

Output:

a  b

1  2    3

4  5    6

Name: c, dtype: int64

edited Nov 19 '18 at 20:45

answered Nov 19 '18 at 20:40

Scott Boston

52.4k72955

Here is a peek in to what is going on. Take a look at the name of the series that gets getting passed into the "applied" function, f.

In the first case (Expected Results):

df = pd.DataFrame([[1, 2, 3], [4, 5, 6], [1, 2, 4]], columns=['a', 'b', 'c'])

df = df.set_index(['a','b']).sort_index()



def f(x):

    print(x)

    print('n')

    print(min(x))

    print('n')

    return min(x)

df.c.groupby(['a','b']).apply(f)

Output:

a  b

1  2    3

   2    4

Name: (1, 2), dtype: int64





3





a  b

4  5    6

Name: (4, 5), dtype: int64





6





Out[292]:



a  b

1  2    3

4  5    6

In the second case (unexpected results), note the name of the series passed in:

df1 = pd.DataFrame([[1, 2, 3], [4, 5, 6]], columns=['a', 'b', 'c'])

df1 = df1.set_index(['a','b']).sort_index()

def f(x):

    print(x)

    print('n')

    print(min(x))

    print('n')

    return min(x)

df1.c.groupby(['a','b']).apply(f)

Output:

a  b

1  2    3

Name: a, dtype: int64





3





a  b

4  5    6

Name: b, dtype: int64





6





Out[293]:



a    3

b    6

Name: c, dtype: int64

It uses these series to build the resulting dataframe. The naming of the series is the culprit due the nature of the data. Why? Well, we'll have to look into the code for that.

The idiomatic fix for this problem is use this syntax:

df1.groupby(df1.index.names)['c'].min()

Output:

a  b

1  2    3

4  5    6

Name: c, dtype: int64

edited Nov 19 '18 at 20:45

answered Nov 19 '18 at 20:40

Scott Boston

52.4k72955

edited Nov 19 '18 at 20:45

answered Nov 19 '18 at 20:40

Scott Boston

52.4k72955

answered Nov 19 '18 at 20:40

Scott Boston

52.4k72955

answered Nov 19 '18 at 20:40

Scott Boston

52.4k72955

Thanks for your answer. I think it may be helpful to include the results of my now added Version 3 here. With Version 3 in mind, I believe something very odd is going on here.
– Richard Shadrach
Nov 19 '18 at 21:02

@RichardShadrach Nothing really strange is going on here. Use the recommended syntax and you'll get the results you're expecting. df3.groupby(df3.index.names)['c'].min()
– Scott Boston
Nov 19 '18 at 21:20

Is the syntax I was using not recommended? Or is it to do something other than what I intended? I find the difference in the result when I simply add a third level to the index mystifying and dangerous when code depends on the result having a certain index/values.
– RghtHndSd
Nov 19 '18 at 21:26

add a comment |

Thanks for your answer. I think it may be helpful to include the results of my now added Version 3 here. With Version 3 in mind, I believe something very odd is going on here.
– Richard Shadrach
Nov 19 '18 at 21:02

@RichardShadrach Nothing really strange is going on here. Use the recommended syntax and you'll get the results you're expecting. df3.groupby(df3.index.names)['c'].min()
– Scott Boston
Nov 19 '18 at 21:20

Is the syntax I was using not recommended? Or is it to do something other than what I intended? I find the difference in the result when I simply add a third level to the index mystifying and dangerous when code depends on the result having a certain index/values.
– RghtHndSd
Nov 19 '18 at 21:26

Thanks for your answer. I think it may be helpful to include the results of my now added Version 3 here. With Version 3 in mind, I believe something very odd is going on here.
– Richard Shadrach
Nov 19 '18 at 21:02

@RichardShadrach Nothing really strange is going on here. Use the recommended syntax and you'll get the results you're expecting. df3.groupby(df3.index.names)['c'].min()
– Scott Boston
Nov 19 '18 at 21:20

Is the syntax I was using not recommended? Or is it to do something other than what I intended? I find the difference in the result when I simply add a third level to the index mystifying and dangerous when code depends on the result having a certain index/values.
– RghtHndSd
Nov 19 '18 at 21:26

add a comment |

You can use the level argument of groupby:

>>> df

     c

a b   

1 2  3

4 5  6



>>> df.c.groupby(level=[0,1]).min()

a  b

1  2    3

4  5    6

Name: c, dtype: int64

From the docs

level : int, level name, or sequence of such, default None

If the axis is a MultiIndex (hierarchical), group by a particular level or levels

answered Nov 19 '18 at 20:25

sacul

30k41740

Indeed this gives the expected result (as does changing it to df.groupby(df.index.names).c.min()). However is the output I showed expected? Is there an explanation for it?
– Richard Shadrach
Nov 19 '18 at 20:28

You can see the difference between the two strategies by looking at the list of groups when you groupby using each different method: df.c.groupby(df.index.names).groups.keys() groups by a and b, whereas df.c.groupby(level=[0,1]).groups.keys() groups by the values in the index, i.e. by (1,2) and (4,5)
– sacul
Nov 19 '18 at 20:33

1

Want more inconsistent behavior? Try selecting as a DataFrame with df[['c']].groupby(df.index.names).min(), which will provide the desired output. I've run into this before, I have to believe it's not intended behavior.
– user3483203
Nov 19 '18 at 20:36

@sacul: Sorry if I'm being dense here, but is getting back the keys ['a', 'b'] from your first line of code in your comment really something I should expect?
– Richard Shadrach
Nov 19 '18 at 20:51

Well, those are the names of the indices, so I guess that yes, I would expect that, but to be honest, I'm not sure why you don't get that for your first example dataframe... Perhaps it's a bug, or I'm missing something somehwere
– sacul
Nov 19 '18 at 20:53

|
show 1 more comment

You can use the level argument of groupby:

>>> df

     c

a b   

1 2  3

4 5  6



>>> df.c.groupby(level=[0,1]).min()

a  b

1  2    3

4  5    6

Name: c, dtype: int64

From the docs

level : int, level name, or sequence of such, default None

If the axis is a MultiIndex (hierarchical), group by a particular level or levels

answered Nov 19 '18 at 20:25

sacul

30k41740

Indeed this gives the expected result (as does changing it to df.groupby(df.index.names).c.min()). However is the output I showed expected? Is there an explanation for it?
– Richard Shadrach
Nov 19 '18 at 20:28

You can see the difference between the two strategies by looking at the list of groups when you groupby using each different method: df.c.groupby(df.index.names).groups.keys() groups by a and b, whereas df.c.groupby(level=[0,1]).groups.keys() groups by the values in the index, i.e. by (1,2) and (4,5)
– sacul
Nov 19 '18 at 20:33

1

Want more inconsistent behavior? Try selecting as a DataFrame with df[['c']].groupby(df.index.names).min(), which will provide the desired output. I've run into this before, I have to believe it's not intended behavior.
– user3483203
Nov 19 '18 at 20:36

@sacul: Sorry if I'm being dense here, but is getting back the keys ['a', 'b'] from your first line of code in your comment really something I should expect?
– Richard Shadrach
Nov 19 '18 at 20:51

Well, those are the names of the indices, so I guess that yes, I would expect that, but to be honest, I'm not sure why you don't get that for your first example dataframe... Perhaps it's a bug, or I'm missing something somehwere
– sacul
Nov 19 '18 at 20:53

|
show 1 more comment

You can use the level argument of groupby:

>>> df

     c

a b   

1 2  3

4 5  6



>>> df.c.groupby(level=[0,1]).min()

a  b

1  2    3

4  5    6

Name: c, dtype: int64

From the docs

level : int, level name, or sequence of such, default None

If the axis is a MultiIndex (hierarchical), group by a particular level or levels

answered Nov 19 '18 at 20:25

sacul

30k41740

You can use the level argument of groupby:

>>> df

     c

a b   

1 2  3

4 5  6



>>> df.c.groupby(level=[0,1]).min()

a  b

1  2    3

4  5    6

Name: c, dtype: int64

From the docs

level : int, level name, or sequence of such, default None

If the axis is a MultiIndex (hierarchical), group by a particular level or levels

answered Nov 19 '18 at 20:25

sacul

30k41740

answered Nov 19 '18 at 20:25

sacul

30k41740

answered Nov 19 '18 at 20:25

sacul

30k41740

answered Nov 19 '18 at 20:25

sacul

30k41740

Indeed this gives the expected result (as does changing it to df.groupby(df.index.names).c.min()). However is the output I showed expected? Is there an explanation for it?
– Richard Shadrach
Nov 19 '18 at 20:28

You can see the difference between the two strategies by looking at the list of groups when you groupby using each different method: df.c.groupby(df.index.names).groups.keys() groups by a and b, whereas df.c.groupby(level=[0,1]).groups.keys() groups by the values in the index, i.e. by (1,2) and (4,5)
– sacul
Nov 19 '18 at 20:33

1

Want more inconsistent behavior? Try selecting as a DataFrame with df[['c']].groupby(df.index.names).min(), which will provide the desired output. I've run into this before, I have to believe it's not intended behavior.
– user3483203
Nov 19 '18 at 20:36

@sacul: Sorry if I'm being dense here, but is getting back the keys ['a', 'b'] from your first line of code in your comment really something I should expect?
– Richard Shadrach
Nov 19 '18 at 20:51

Well, those are the names of the indices, so I guess that yes, I would expect that, but to be honest, I'm not sure why you don't get that for your first example dataframe... Perhaps it's a bug, or I'm missing something somehwere
– sacul
Nov 19 '18 at 20:53

|
show 1 more comment

Indeed this gives the expected result (as does changing it to df.groupby(df.index.names).c.min()). However is the output I showed expected? Is there an explanation for it?
– Richard Shadrach
Nov 19 '18 at 20:28

You can see the difference between the two strategies by looking at the list of groups when you groupby using each different method: df.c.groupby(df.index.names).groups.keys() groups by a and b, whereas df.c.groupby(level=[0,1]).groups.keys() groups by the values in the index, i.e. by (1,2) and (4,5)
– sacul
Nov 19 '18 at 20:33

1

Want more inconsistent behavior? Try selecting as a DataFrame with df[['c']].groupby(df.index.names).min(), which will provide the desired output. I've run into this before, I have to believe it's not intended behavior.
– user3483203
Nov 19 '18 at 20:36

@sacul: Sorry if I'm being dense here, but is getting back the keys ['a', 'b'] from your first line of code in your comment really something I should expect?
– Richard Shadrach
Nov 19 '18 at 20:51

Well, those are the names of the indices, so I guess that yes, I would expect that, but to be honest, I'm not sure why you don't get that for your first example dataframe... Perhaps it's a bug, or I'm missing something somehwere
– sacul
Nov 19 '18 at 20:53

Indeed this gives the expected result (as does changing it to df.groupby(df.index.names).c.min()). However is the output I showed expected? Is there an explanation for it?
– Richard Shadrach
Nov 19 '18 at 20:28

You can see the difference between the two strategies by looking at the list of groups when you groupby using each different method: df.c.groupby(df.index.names).groups.keys() groups by a and b, whereas df.c.groupby(level=[0,1]).groups.keys() groups by the values in the index, i.e. by (1,2) and (4,5)
– sacul
Nov 19 '18 at 20:33

Want more inconsistent behavior? Try selecting as a DataFrame with df[['c']].groupby(df.index.names).min(), which will provide the desired output. I've run into this before, I have to believe it's not intended behavior.
– user3483203
Nov 19 '18 at 20:36

@sacul: Sorry if I'm being dense here, but is getting back the keys ['a', 'b'] from your first line of code in your comment really something I should expect?
– Richard Shadrach
Nov 19 '18 at 20:51

Well, those are the names of the indices, so I guess that yes, I would expect that, but to be honest, I'm not sure why you don't get that for your first example dataframe... Perhaps it's a bug, or I'm missing something somehwere
– sacul
Nov 19 '18 at 20:53

|
show 1 more comment

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

Search This Blog

Ufyukyu