Pandas groupby multiindex when unique on first level: unexpected results












2














Python version: 3.5.2; Pandas version: 0.23.1



I am noticing unexpected behavior when I groupby using two indices but each row is unique on the first index. The code I am executing on my data frame with column c is:



df.c.groupby(df.index.names).min()


Everything works as expected when the rows are not unique on the first index. To make this clear, I've included two versions below. Edit: Now including three versions!



Version 1: Has the expected output



df = pd.DataFrame([[1, 2, 3], [4, 5, 6], [1, 2, 4]], columns=['a', 'b', 'c'])
df = df.set_index(['a','b']).sort_index()


Input:



     c
a b
1 2 3
2 4
4 5 6


Output:



a  b
1 2 3
4 5 6


Version 2: Has the unexpected output



df = pd.DataFrame([[1, 2, 3], [4, 5, 6]], columns=['a', 'b', 'c'])
df = df.set_index(['a','b']).sort_index()


Input:



     c
a b
1 2 3
4 5 6


Output:



a    3
b 6


Expected Output:



a  b
1 2 3
4 5 6


Version 3: Has expected output, but not expected with version 2 in mind.



df = pd.DataFrame([[1, 2, 3, 4], [4, 5, 6, 7]], columns=['a', 'b1', 'b2', 'c'])
df = df.set_index(['a','b1','b2']).sort_index()


Input:



         c
a b1 b2
1 2 3 4
4 5 6 7


Output:



a  b1  b2
1 2 3 4
4 5 6 7









share|improve this question





























    2














    Python version: 3.5.2; Pandas version: 0.23.1



    I am noticing unexpected behavior when I groupby using two indices but each row is unique on the first index. The code I am executing on my data frame with column c is:



    df.c.groupby(df.index.names).min()


    Everything works as expected when the rows are not unique on the first index. To make this clear, I've included two versions below. Edit: Now including three versions!



    Version 1: Has the expected output



    df = pd.DataFrame([[1, 2, 3], [4, 5, 6], [1, 2, 4]], columns=['a', 'b', 'c'])
    df = df.set_index(['a','b']).sort_index()


    Input:



         c
    a b
    1 2 3
    2 4
    4 5 6


    Output:



    a  b
    1 2 3
    4 5 6


    Version 2: Has the unexpected output



    df = pd.DataFrame([[1, 2, 3], [4, 5, 6]], columns=['a', 'b', 'c'])
    df = df.set_index(['a','b']).sort_index()


    Input:



         c
    a b
    1 2 3
    4 5 6


    Output:



    a    3
    b 6


    Expected Output:



    a  b
    1 2 3
    4 5 6


    Version 3: Has expected output, but not expected with version 2 in mind.



    df = pd.DataFrame([[1, 2, 3, 4], [4, 5, 6, 7]], columns=['a', 'b1', 'b2', 'c'])
    df = df.set_index(['a','b1','b2']).sort_index()


    Input:



             c
    a b1 b2
    1 2 3 4
    4 5 6 7


    Output:



    a  b1  b2
    1 2 3 4
    4 5 6 7









    share|improve this question



























      2












      2








      2







      Python version: 3.5.2; Pandas version: 0.23.1



      I am noticing unexpected behavior when I groupby using two indices but each row is unique on the first index. The code I am executing on my data frame with column c is:



      df.c.groupby(df.index.names).min()


      Everything works as expected when the rows are not unique on the first index. To make this clear, I've included two versions below. Edit: Now including three versions!



      Version 1: Has the expected output



      df = pd.DataFrame([[1, 2, 3], [4, 5, 6], [1, 2, 4]], columns=['a', 'b', 'c'])
      df = df.set_index(['a','b']).sort_index()


      Input:



           c
      a b
      1 2 3
      2 4
      4 5 6


      Output:



      a  b
      1 2 3
      4 5 6


      Version 2: Has the unexpected output



      df = pd.DataFrame([[1, 2, 3], [4, 5, 6]], columns=['a', 'b', 'c'])
      df = df.set_index(['a','b']).sort_index()


      Input:



           c
      a b
      1 2 3
      4 5 6


      Output:



      a    3
      b 6


      Expected Output:



      a  b
      1 2 3
      4 5 6


      Version 3: Has expected output, but not expected with version 2 in mind.



      df = pd.DataFrame([[1, 2, 3, 4], [4, 5, 6, 7]], columns=['a', 'b1', 'b2', 'c'])
      df = df.set_index(['a','b1','b2']).sort_index()


      Input:



               c
      a b1 b2
      1 2 3 4
      4 5 6 7


      Output:



      a  b1  b2
      1 2 3 4
      4 5 6 7









      share|improve this question















      Python version: 3.5.2; Pandas version: 0.23.1



      I am noticing unexpected behavior when I groupby using two indices but each row is unique on the first index. The code I am executing on my data frame with column c is:



      df.c.groupby(df.index.names).min()


      Everything works as expected when the rows are not unique on the first index. To make this clear, I've included two versions below. Edit: Now including three versions!



      Version 1: Has the expected output



      df = pd.DataFrame([[1, 2, 3], [4, 5, 6], [1, 2, 4]], columns=['a', 'b', 'c'])
      df = df.set_index(['a','b']).sort_index()


      Input:



           c
      a b
      1 2 3
      2 4
      4 5 6


      Output:



      a  b
      1 2 3
      4 5 6


      Version 2: Has the unexpected output



      df = pd.DataFrame([[1, 2, 3], [4, 5, 6]], columns=['a', 'b', 'c'])
      df = df.set_index(['a','b']).sort_index()


      Input:



           c
      a b
      1 2 3
      4 5 6


      Output:



      a    3
      b 6


      Expected Output:



      a  b
      1 2 3
      4 5 6


      Version 3: Has expected output, but not expected with version 2 in mind.



      df = pd.DataFrame([[1, 2, 3, 4], [4, 5, 6, 7]], columns=['a', 'b1', 'b2', 'c'])
      df = df.set_index(['a','b1','b2']).sort_index()


      Input:



               c
      a b1 b2
      1 2 3 4
      4 5 6 7


      Output:



      a  b1  b2
      1 2 3 4
      4 5 6 7






      python pandas pandas-groupby






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 19 '18 at 20:46







      Richard Shadrach

















      asked Nov 19 '18 at 20:20









      Richard ShadrachRichard Shadrach

      112




      112
























          2 Answers
          2






          active

          oldest

          votes


















          1














          Here is a peek in to what is going on. Take a look at the name of the series that gets getting passed into the "applied" function, f.



          In the first case (Expected Results):



          df = pd.DataFrame([[1, 2, 3], [4, 5, 6], [1, 2, 4]], columns=['a', 'b', 'c'])
          df = df.set_index(['a','b']).sort_index()

          def f(x):
          print(x)
          print('n')
          print(min(x))
          print('n')
          return min(x)
          df.c.groupby(['a','b']).apply(f)


          Output:



          a  b
          1 2 3
          2 4
          Name: (1, 2), dtype: int64


          3


          a b
          4 5 6
          Name: (4, 5), dtype: int64


          6


          Out[292]:

          a b
          1 2 3
          4 5 6


          In the second case (unexpected results), note the name of the series passed in:



          df1 = pd.DataFrame([[1, 2, 3], [4, 5, 6]], columns=['a', 'b', 'c'])
          df1 = df1.set_index(['a','b']).sort_index()
          def f(x):
          print(x)
          print('n')
          print(min(x))
          print('n')
          return min(x)
          df1.c.groupby(['a','b']).apply(f)


          Output:



          a  b
          1 2 3
          Name: a, dtype: int64


          3


          a b
          4 5 6
          Name: b, dtype: int64


          6


          Out[293]:

          a 3
          b 6
          Name: c, dtype: int64


          It uses these series to build the resulting dataframe. The naming of the series is the culprit due the nature of the data. Why? Well, we'll have to look into the code for that.



          The idiomatic fix for this problem is use this syntax:



          df1.groupby(df1.index.names)['c'].min()


          Output:



          a  b
          1 2 3
          4 5 6
          Name: c, dtype: int64





          share|improve this answer























          • Thanks for your answer. I think it may be helpful to include the results of my now added Version 3 here. With Version 3 in mind, I believe something very odd is going on here.
            – Richard Shadrach
            Nov 19 '18 at 21:02










          • @RichardShadrach Nothing really strange is going on here. Use the recommended syntax and you'll get the results you're expecting. df3.groupby(df3.index.names)['c'].min()
            – Scott Boston
            Nov 19 '18 at 21:20












          • Is the syntax I was using not recommended? Or is it to do something other than what I intended? I find the difference in the result when I simply add a third level to the index mystifying and dangerous when code depends on the result having a certain index/values.
            – RghtHndSd
            Nov 19 '18 at 21:26





















          0














          You can use the level argument of groupby:



          >>> df
          c
          a b
          1 2 3
          4 5 6

          >>> df.c.groupby(level=[0,1]).min()
          a b
          1 2 3
          4 5 6
          Name: c, dtype: int64


          From the docs




          level : int, level name, or sequence of such, default None




          If the axis is a MultiIndex (hierarchical), group by a particular level or levels








          share|improve this answer





















          • Indeed this gives the expected result (as does changing it to df.groupby(df.index.names).c.min()). However is the output I showed expected? Is there an explanation for it?
            – Richard Shadrach
            Nov 19 '18 at 20:28










          • You can see the difference between the two strategies by looking at the list of groups when you groupby using each different method: df.c.groupby(df.index.names).groups.keys() groups by a and b, whereas df.c.groupby(level=[0,1]).groups.keys() groups by the values in the index, i.e. by (1,2) and (4,5)
            – sacul
            Nov 19 '18 at 20:33








          • 1




            Want more inconsistent behavior? Try selecting as a DataFrame with df[['c']].groupby(df.index.names).min(), which will provide the desired output. I've run into this before, I have to believe it's not intended behavior.
            – user3483203
            Nov 19 '18 at 20:36












          • @sacul: Sorry if I'm being dense here, but is getting back the keys ['a', 'b'] from your first line of code in your comment really something I should expect?
            – Richard Shadrach
            Nov 19 '18 at 20:51










          • Well, those are the names of the indices, so I guess that yes, I would expect that, but to be honest, I'm not sure why you don't get that for your first example dataframe... Perhaps it's a bug, or I'm missing something somehwere
            – sacul
            Nov 19 '18 at 20:53













          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53382065%2fpandas-groupby-multiindex-when-unique-on-first-level-unexpected-results%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          2 Answers
          2






          active

          oldest

          votes








          2 Answers
          2






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          1














          Here is a peek in to what is going on. Take a look at the name of the series that gets getting passed into the "applied" function, f.



          In the first case (Expected Results):



          df = pd.DataFrame([[1, 2, 3], [4, 5, 6], [1, 2, 4]], columns=['a', 'b', 'c'])
          df = df.set_index(['a','b']).sort_index()

          def f(x):
          print(x)
          print('n')
          print(min(x))
          print('n')
          return min(x)
          df.c.groupby(['a','b']).apply(f)


          Output:



          a  b
          1 2 3
          2 4
          Name: (1, 2), dtype: int64


          3


          a b
          4 5 6
          Name: (4, 5), dtype: int64


          6


          Out[292]:

          a b
          1 2 3
          4 5 6


          In the second case (unexpected results), note the name of the series passed in:



          df1 = pd.DataFrame([[1, 2, 3], [4, 5, 6]], columns=['a', 'b', 'c'])
          df1 = df1.set_index(['a','b']).sort_index()
          def f(x):
          print(x)
          print('n')
          print(min(x))
          print('n')
          return min(x)
          df1.c.groupby(['a','b']).apply(f)


          Output:



          a  b
          1 2 3
          Name: a, dtype: int64


          3


          a b
          4 5 6
          Name: b, dtype: int64


          6


          Out[293]:

          a 3
          b 6
          Name: c, dtype: int64


          It uses these series to build the resulting dataframe. The naming of the series is the culprit due the nature of the data. Why? Well, we'll have to look into the code for that.



          The idiomatic fix for this problem is use this syntax:



          df1.groupby(df1.index.names)['c'].min()


          Output:



          a  b
          1 2 3
          4 5 6
          Name: c, dtype: int64





          share|improve this answer























          • Thanks for your answer. I think it may be helpful to include the results of my now added Version 3 here. With Version 3 in mind, I believe something very odd is going on here.
            – Richard Shadrach
            Nov 19 '18 at 21:02










          • @RichardShadrach Nothing really strange is going on here. Use the recommended syntax and you'll get the results you're expecting. df3.groupby(df3.index.names)['c'].min()
            – Scott Boston
            Nov 19 '18 at 21:20












          • Is the syntax I was using not recommended? Or is it to do something other than what I intended? I find the difference in the result when I simply add a third level to the index mystifying and dangerous when code depends on the result having a certain index/values.
            – RghtHndSd
            Nov 19 '18 at 21:26


















          1














          Here is a peek in to what is going on. Take a look at the name of the series that gets getting passed into the "applied" function, f.



          In the first case (Expected Results):



          df = pd.DataFrame([[1, 2, 3], [4, 5, 6], [1, 2, 4]], columns=['a', 'b', 'c'])
          df = df.set_index(['a','b']).sort_index()

          def f(x):
          print(x)
          print('n')
          print(min(x))
          print('n')
          return min(x)
          df.c.groupby(['a','b']).apply(f)


          Output:



          a  b
          1 2 3
          2 4
          Name: (1, 2), dtype: int64


          3


          a b
          4 5 6
          Name: (4, 5), dtype: int64


          6


          Out[292]:

          a b
          1 2 3
          4 5 6


          In the second case (unexpected results), note the name of the series passed in:



          df1 = pd.DataFrame([[1, 2, 3], [4, 5, 6]], columns=['a', 'b', 'c'])
          df1 = df1.set_index(['a','b']).sort_index()
          def f(x):
          print(x)
          print('n')
          print(min(x))
          print('n')
          return min(x)
          df1.c.groupby(['a','b']).apply(f)


          Output:



          a  b
          1 2 3
          Name: a, dtype: int64


          3


          a b
          4 5 6
          Name: b, dtype: int64


          6


          Out[293]:

          a 3
          b 6
          Name: c, dtype: int64


          It uses these series to build the resulting dataframe. The naming of the series is the culprit due the nature of the data. Why? Well, we'll have to look into the code for that.



          The idiomatic fix for this problem is use this syntax:



          df1.groupby(df1.index.names)['c'].min()


          Output:



          a  b
          1 2 3
          4 5 6
          Name: c, dtype: int64





          share|improve this answer























          • Thanks for your answer. I think it may be helpful to include the results of my now added Version 3 here. With Version 3 in mind, I believe something very odd is going on here.
            – Richard Shadrach
            Nov 19 '18 at 21:02










          • @RichardShadrach Nothing really strange is going on here. Use the recommended syntax and you'll get the results you're expecting. df3.groupby(df3.index.names)['c'].min()
            – Scott Boston
            Nov 19 '18 at 21:20












          • Is the syntax I was using not recommended? Or is it to do something other than what I intended? I find the difference in the result when I simply add a third level to the index mystifying and dangerous when code depends on the result having a certain index/values.
            – RghtHndSd
            Nov 19 '18 at 21:26
















          1












          1








          1






          Here is a peek in to what is going on. Take a look at the name of the series that gets getting passed into the "applied" function, f.



          In the first case (Expected Results):



          df = pd.DataFrame([[1, 2, 3], [4, 5, 6], [1, 2, 4]], columns=['a', 'b', 'c'])
          df = df.set_index(['a','b']).sort_index()

          def f(x):
          print(x)
          print('n')
          print(min(x))
          print('n')
          return min(x)
          df.c.groupby(['a','b']).apply(f)


          Output:



          a  b
          1 2 3
          2 4
          Name: (1, 2), dtype: int64


          3


          a b
          4 5 6
          Name: (4, 5), dtype: int64


          6


          Out[292]:

          a b
          1 2 3
          4 5 6


          In the second case (unexpected results), note the name of the series passed in:



          df1 = pd.DataFrame([[1, 2, 3], [4, 5, 6]], columns=['a', 'b', 'c'])
          df1 = df1.set_index(['a','b']).sort_index()
          def f(x):
          print(x)
          print('n')
          print(min(x))
          print('n')
          return min(x)
          df1.c.groupby(['a','b']).apply(f)


          Output:



          a  b
          1 2 3
          Name: a, dtype: int64


          3


          a b
          4 5 6
          Name: b, dtype: int64


          6


          Out[293]:

          a 3
          b 6
          Name: c, dtype: int64


          It uses these series to build the resulting dataframe. The naming of the series is the culprit due the nature of the data. Why? Well, we'll have to look into the code for that.



          The idiomatic fix for this problem is use this syntax:



          df1.groupby(df1.index.names)['c'].min()


          Output:



          a  b
          1 2 3
          4 5 6
          Name: c, dtype: int64





          share|improve this answer














          Here is a peek in to what is going on. Take a look at the name of the series that gets getting passed into the "applied" function, f.



          In the first case (Expected Results):



          df = pd.DataFrame([[1, 2, 3], [4, 5, 6], [1, 2, 4]], columns=['a', 'b', 'c'])
          df = df.set_index(['a','b']).sort_index()

          def f(x):
          print(x)
          print('n')
          print(min(x))
          print('n')
          return min(x)
          df.c.groupby(['a','b']).apply(f)


          Output:



          a  b
          1 2 3
          2 4
          Name: (1, 2), dtype: int64


          3


          a b
          4 5 6
          Name: (4, 5), dtype: int64


          6


          Out[292]:

          a b
          1 2 3
          4 5 6


          In the second case (unexpected results), note the name of the series passed in:



          df1 = pd.DataFrame([[1, 2, 3], [4, 5, 6]], columns=['a', 'b', 'c'])
          df1 = df1.set_index(['a','b']).sort_index()
          def f(x):
          print(x)
          print('n')
          print(min(x))
          print('n')
          return min(x)
          df1.c.groupby(['a','b']).apply(f)


          Output:



          a  b
          1 2 3
          Name: a, dtype: int64


          3


          a b
          4 5 6
          Name: b, dtype: int64


          6


          Out[293]:

          a 3
          b 6
          Name: c, dtype: int64


          It uses these series to build the resulting dataframe. The naming of the series is the culprit due the nature of the data. Why? Well, we'll have to look into the code for that.



          The idiomatic fix for this problem is use this syntax:



          df1.groupby(df1.index.names)['c'].min()


          Output:



          a  b
          1 2 3
          4 5 6
          Name: c, dtype: int64






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 19 '18 at 20:45

























          answered Nov 19 '18 at 20:40









          Scott BostonScott Boston

          52.4k72955




          52.4k72955












          • Thanks for your answer. I think it may be helpful to include the results of my now added Version 3 here. With Version 3 in mind, I believe something very odd is going on here.
            – Richard Shadrach
            Nov 19 '18 at 21:02










          • @RichardShadrach Nothing really strange is going on here. Use the recommended syntax and you'll get the results you're expecting. df3.groupby(df3.index.names)['c'].min()
            – Scott Boston
            Nov 19 '18 at 21:20












          • Is the syntax I was using not recommended? Or is it to do something other than what I intended? I find the difference in the result when I simply add a third level to the index mystifying and dangerous when code depends on the result having a certain index/values.
            – RghtHndSd
            Nov 19 '18 at 21:26




















          • Thanks for your answer. I think it may be helpful to include the results of my now added Version 3 here. With Version 3 in mind, I believe something very odd is going on here.
            – Richard Shadrach
            Nov 19 '18 at 21:02










          • @RichardShadrach Nothing really strange is going on here. Use the recommended syntax and you'll get the results you're expecting. df3.groupby(df3.index.names)['c'].min()
            – Scott Boston
            Nov 19 '18 at 21:20












          • Is the syntax I was using not recommended? Or is it to do something other than what I intended? I find the difference in the result when I simply add a third level to the index mystifying and dangerous when code depends on the result having a certain index/values.
            – RghtHndSd
            Nov 19 '18 at 21:26


















          Thanks for your answer. I think it may be helpful to include the results of my now added Version 3 here. With Version 3 in mind, I believe something very odd is going on here.
          – Richard Shadrach
          Nov 19 '18 at 21:02




          Thanks for your answer. I think it may be helpful to include the results of my now added Version 3 here. With Version 3 in mind, I believe something very odd is going on here.
          – Richard Shadrach
          Nov 19 '18 at 21:02












          @RichardShadrach Nothing really strange is going on here. Use the recommended syntax and you'll get the results you're expecting. df3.groupby(df3.index.names)['c'].min()
          – Scott Boston
          Nov 19 '18 at 21:20






          @RichardShadrach Nothing really strange is going on here. Use the recommended syntax and you'll get the results you're expecting. df3.groupby(df3.index.names)['c'].min()
          – Scott Boston
          Nov 19 '18 at 21:20














          Is the syntax I was using not recommended? Or is it to do something other than what I intended? I find the difference in the result when I simply add a third level to the index mystifying and dangerous when code depends on the result having a certain index/values.
          – RghtHndSd
          Nov 19 '18 at 21:26






          Is the syntax I was using not recommended? Or is it to do something other than what I intended? I find the difference in the result when I simply add a third level to the index mystifying and dangerous when code depends on the result having a certain index/values.
          – RghtHndSd
          Nov 19 '18 at 21:26















          0














          You can use the level argument of groupby:



          >>> df
          c
          a b
          1 2 3
          4 5 6

          >>> df.c.groupby(level=[0,1]).min()
          a b
          1 2 3
          4 5 6
          Name: c, dtype: int64


          From the docs




          level : int, level name, or sequence of such, default None




          If the axis is a MultiIndex (hierarchical), group by a particular level or levels








          share|improve this answer





















          • Indeed this gives the expected result (as does changing it to df.groupby(df.index.names).c.min()). However is the output I showed expected? Is there an explanation for it?
            – Richard Shadrach
            Nov 19 '18 at 20:28










          • You can see the difference between the two strategies by looking at the list of groups when you groupby using each different method: df.c.groupby(df.index.names).groups.keys() groups by a and b, whereas df.c.groupby(level=[0,1]).groups.keys() groups by the values in the index, i.e. by (1,2) and (4,5)
            – sacul
            Nov 19 '18 at 20:33








          • 1




            Want more inconsistent behavior? Try selecting as a DataFrame with df[['c']].groupby(df.index.names).min(), which will provide the desired output. I've run into this before, I have to believe it's not intended behavior.
            – user3483203
            Nov 19 '18 at 20:36












          • @sacul: Sorry if I'm being dense here, but is getting back the keys ['a', 'b'] from your first line of code in your comment really something I should expect?
            – Richard Shadrach
            Nov 19 '18 at 20:51










          • Well, those are the names of the indices, so I guess that yes, I would expect that, but to be honest, I'm not sure why you don't get that for your first example dataframe... Perhaps it's a bug, or I'm missing something somehwere
            – sacul
            Nov 19 '18 at 20:53


















          0














          You can use the level argument of groupby:



          >>> df
          c
          a b
          1 2 3
          4 5 6

          >>> df.c.groupby(level=[0,1]).min()
          a b
          1 2 3
          4 5 6
          Name: c, dtype: int64


          From the docs




          level : int, level name, or sequence of such, default None




          If the axis is a MultiIndex (hierarchical), group by a particular level or levels








          share|improve this answer





















          • Indeed this gives the expected result (as does changing it to df.groupby(df.index.names).c.min()). However is the output I showed expected? Is there an explanation for it?
            – Richard Shadrach
            Nov 19 '18 at 20:28










          • You can see the difference between the two strategies by looking at the list of groups when you groupby using each different method: df.c.groupby(df.index.names).groups.keys() groups by a and b, whereas df.c.groupby(level=[0,1]).groups.keys() groups by the values in the index, i.e. by (1,2) and (4,5)
            – sacul
            Nov 19 '18 at 20:33








          • 1




            Want more inconsistent behavior? Try selecting as a DataFrame with df[['c']].groupby(df.index.names).min(), which will provide the desired output. I've run into this before, I have to believe it's not intended behavior.
            – user3483203
            Nov 19 '18 at 20:36












          • @sacul: Sorry if I'm being dense here, but is getting back the keys ['a', 'b'] from your first line of code in your comment really something I should expect?
            – Richard Shadrach
            Nov 19 '18 at 20:51










          • Well, those are the names of the indices, so I guess that yes, I would expect that, but to be honest, I'm not sure why you don't get that for your first example dataframe... Perhaps it's a bug, or I'm missing something somehwere
            – sacul
            Nov 19 '18 at 20:53
















          0












          0








          0






          You can use the level argument of groupby:



          >>> df
          c
          a b
          1 2 3
          4 5 6

          >>> df.c.groupby(level=[0,1]).min()
          a b
          1 2 3
          4 5 6
          Name: c, dtype: int64


          From the docs




          level : int, level name, or sequence of such, default None




          If the axis is a MultiIndex (hierarchical), group by a particular level or levels








          share|improve this answer












          You can use the level argument of groupby:



          >>> df
          c
          a b
          1 2 3
          4 5 6

          >>> df.c.groupby(level=[0,1]).min()
          a b
          1 2 3
          4 5 6
          Name: c, dtype: int64


          From the docs




          level : int, level name, or sequence of such, default None




          If the axis is a MultiIndex (hierarchical), group by a particular level or levels









          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 19 '18 at 20:25









          saculsacul

          30k41740




          30k41740












          • Indeed this gives the expected result (as does changing it to df.groupby(df.index.names).c.min()). However is the output I showed expected? Is there an explanation for it?
            – Richard Shadrach
            Nov 19 '18 at 20:28










          • You can see the difference between the two strategies by looking at the list of groups when you groupby using each different method: df.c.groupby(df.index.names).groups.keys() groups by a and b, whereas df.c.groupby(level=[0,1]).groups.keys() groups by the values in the index, i.e. by (1,2) and (4,5)
            – sacul
            Nov 19 '18 at 20:33








          • 1




            Want more inconsistent behavior? Try selecting as a DataFrame with df[['c']].groupby(df.index.names).min(), which will provide the desired output. I've run into this before, I have to believe it's not intended behavior.
            – user3483203
            Nov 19 '18 at 20:36












          • @sacul: Sorry if I'm being dense here, but is getting back the keys ['a', 'b'] from your first line of code in your comment really something I should expect?
            – Richard Shadrach
            Nov 19 '18 at 20:51










          • Well, those are the names of the indices, so I guess that yes, I would expect that, but to be honest, I'm not sure why you don't get that for your first example dataframe... Perhaps it's a bug, or I'm missing something somehwere
            – sacul
            Nov 19 '18 at 20:53




















          • Indeed this gives the expected result (as does changing it to df.groupby(df.index.names).c.min()). However is the output I showed expected? Is there an explanation for it?
            – Richard Shadrach
            Nov 19 '18 at 20:28










          • You can see the difference between the two strategies by looking at the list of groups when you groupby using each different method: df.c.groupby(df.index.names).groups.keys() groups by a and b, whereas df.c.groupby(level=[0,1]).groups.keys() groups by the values in the index, i.e. by (1,2) and (4,5)
            – sacul
            Nov 19 '18 at 20:33








          • 1




            Want more inconsistent behavior? Try selecting as a DataFrame with df[['c']].groupby(df.index.names).min(), which will provide the desired output. I've run into this before, I have to believe it's not intended behavior.
            – user3483203
            Nov 19 '18 at 20:36












          • @sacul: Sorry if I'm being dense here, but is getting back the keys ['a', 'b'] from your first line of code in your comment really something I should expect?
            – Richard Shadrach
            Nov 19 '18 at 20:51










          • Well, those are the names of the indices, so I guess that yes, I would expect that, but to be honest, I'm not sure why you don't get that for your first example dataframe... Perhaps it's a bug, or I'm missing something somehwere
            – sacul
            Nov 19 '18 at 20:53


















          Indeed this gives the expected result (as does changing it to df.groupby(df.index.names).c.min()). However is the output I showed expected? Is there an explanation for it?
          – Richard Shadrach
          Nov 19 '18 at 20:28




          Indeed this gives the expected result (as does changing it to df.groupby(df.index.names).c.min()). However is the output I showed expected? Is there an explanation for it?
          – Richard Shadrach
          Nov 19 '18 at 20:28












          You can see the difference between the two strategies by looking at the list of groups when you groupby using each different method: df.c.groupby(df.index.names).groups.keys() groups by a and b, whereas df.c.groupby(level=[0,1]).groups.keys() groups by the values in the index, i.e. by (1,2) and (4,5)
          – sacul
          Nov 19 '18 at 20:33






          You can see the difference between the two strategies by looking at the list of groups when you groupby using each different method: df.c.groupby(df.index.names).groups.keys() groups by a and b, whereas df.c.groupby(level=[0,1]).groups.keys() groups by the values in the index, i.e. by (1,2) and (4,5)
          – sacul
          Nov 19 '18 at 20:33






          1




          1




          Want more inconsistent behavior? Try selecting as a DataFrame with df[['c']].groupby(df.index.names).min(), which will provide the desired output. I've run into this before, I have to believe it's not intended behavior.
          – user3483203
          Nov 19 '18 at 20:36






          Want more inconsistent behavior? Try selecting as a DataFrame with df[['c']].groupby(df.index.names).min(), which will provide the desired output. I've run into this before, I have to believe it's not intended behavior.
          – user3483203
          Nov 19 '18 at 20:36














          @sacul: Sorry if I'm being dense here, but is getting back the keys ['a', 'b'] from your first line of code in your comment really something I should expect?
          – Richard Shadrach
          Nov 19 '18 at 20:51




          @sacul: Sorry if I'm being dense here, but is getting back the keys ['a', 'b'] from your first line of code in your comment really something I should expect?
          – Richard Shadrach
          Nov 19 '18 at 20:51












          Well, those are the names of the indices, so I guess that yes, I would expect that, but to be honest, I'm not sure why you don't get that for your first example dataframe... Perhaps it's a bug, or I'm missing something somehwere
          – sacul
          Nov 19 '18 at 20:53






          Well, those are the names of the indices, so I guess that yes, I would expect that, but to be honest, I'm not sure why you don't get that for your first example dataframe... Perhaps it's a bug, or I'm missing something somehwere
          – sacul
          Nov 19 '18 at 20:53




















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53382065%2fpandas-groupby-multiindex-when-unique-on-first-level-unexpected-results%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          MongoDB - Not Authorized To Execute Command

          in spring boot 2.1 many test slices are not allowed anymore due to multiple @BootstrapWith

          How to fix TextFormField cause rebuild widget in Flutter