MultiIndex pivot in pandas, long-to-wide problem












0















I have what I am sure is a simple long to wide problem, but I cannot get the syntax right on how to handle this when I have multiple columns at play.



My data looks like this:



>>> df.head()
s_name geo zip date value
0 A002X zip 00601 2010 1
1 A002Y zip 00601 2010 2
2 A003X zip 00601 2010 3
3 A003Y zip 00601 2010 4
...
. A002X zip 00602 2010 8
. A002Y zip 00602 2010 9
. A003X zip 00602 2010 10
. A003Y zip 00602 2010 11


I would like it to look like this, that is, the contents of the s_name column become new columns, and their values are the 'value' column, grouped by (geo,zip,date):



>>> df2.head()
A002X A002Y A003X A003Y geo zip date
0 1 2 3 4 zip 00601 2010
1 8 9 10 11 zip 00602 2010
2 19 20 21 22 zip 00603 2010
3 31 32 33 45 zip 00604 2010


I've tried a couple like this



>>> df.pivot(columns='s_name', values=["geo","zip","date","value"])

>>> df.pivot(columns='s_name', values=["geo","zip","date","value"]).unstack().reset_index()


I feel like I'm really close, but nothing seems to match up. This, for example, drops the extra non-s_name values:



>>> df.pivot(columns='name', values=["value"]).head()


How do I get the format I'm looking for?










share|improve this question




















  • 1





    Does each group {'A002X', ....} have the same number of observations?

    – Daniel Mesejo
    Jan 3 at 0:13













  • It ought to. (But data is data....)

    – Mittenchops
    Jan 3 at 0:18
















0















I have what I am sure is a simple long to wide problem, but I cannot get the syntax right on how to handle this when I have multiple columns at play.



My data looks like this:



>>> df.head()
s_name geo zip date value
0 A002X zip 00601 2010 1
1 A002Y zip 00601 2010 2
2 A003X zip 00601 2010 3
3 A003Y zip 00601 2010 4
...
. A002X zip 00602 2010 8
. A002Y zip 00602 2010 9
. A003X zip 00602 2010 10
. A003Y zip 00602 2010 11


I would like it to look like this, that is, the contents of the s_name column become new columns, and their values are the 'value' column, grouped by (geo,zip,date):



>>> df2.head()
A002X A002Y A003X A003Y geo zip date
0 1 2 3 4 zip 00601 2010
1 8 9 10 11 zip 00602 2010
2 19 20 21 22 zip 00603 2010
3 31 32 33 45 zip 00604 2010


I've tried a couple like this



>>> df.pivot(columns='s_name', values=["geo","zip","date","value"])

>>> df.pivot(columns='s_name', values=["geo","zip","date","value"]).unstack().reset_index()


I feel like I'm really close, but nothing seems to match up. This, for example, drops the extra non-s_name values:



>>> df.pivot(columns='name', values=["value"]).head()


How do I get the format I'm looking for?










share|improve this question




















  • 1





    Does each group {'A002X', ....} have the same number of observations?

    – Daniel Mesejo
    Jan 3 at 0:13













  • It ought to. (But data is data....)

    – Mittenchops
    Jan 3 at 0:18














0












0








0








I have what I am sure is a simple long to wide problem, but I cannot get the syntax right on how to handle this when I have multiple columns at play.



My data looks like this:



>>> df.head()
s_name geo zip date value
0 A002X zip 00601 2010 1
1 A002Y zip 00601 2010 2
2 A003X zip 00601 2010 3
3 A003Y zip 00601 2010 4
...
. A002X zip 00602 2010 8
. A002Y zip 00602 2010 9
. A003X zip 00602 2010 10
. A003Y zip 00602 2010 11


I would like it to look like this, that is, the contents of the s_name column become new columns, and their values are the 'value' column, grouped by (geo,zip,date):



>>> df2.head()
A002X A002Y A003X A003Y geo zip date
0 1 2 3 4 zip 00601 2010
1 8 9 10 11 zip 00602 2010
2 19 20 21 22 zip 00603 2010
3 31 32 33 45 zip 00604 2010


I've tried a couple like this



>>> df.pivot(columns='s_name', values=["geo","zip","date","value"])

>>> df.pivot(columns='s_name', values=["geo","zip","date","value"]).unstack().reset_index()


I feel like I'm really close, but nothing seems to match up. This, for example, drops the extra non-s_name values:



>>> df.pivot(columns='name', values=["value"]).head()


How do I get the format I'm looking for?










share|improve this question
















I have what I am sure is a simple long to wide problem, but I cannot get the syntax right on how to handle this when I have multiple columns at play.



My data looks like this:



>>> df.head()
s_name geo zip date value
0 A002X zip 00601 2010 1
1 A002Y zip 00601 2010 2
2 A003X zip 00601 2010 3
3 A003Y zip 00601 2010 4
...
. A002X zip 00602 2010 8
. A002Y zip 00602 2010 9
. A003X zip 00602 2010 10
. A003Y zip 00602 2010 11


I would like it to look like this, that is, the contents of the s_name column become new columns, and their values are the 'value' column, grouped by (geo,zip,date):



>>> df2.head()
A002X A002Y A003X A003Y geo zip date
0 1 2 3 4 zip 00601 2010
1 8 9 10 11 zip 00602 2010
2 19 20 21 22 zip 00603 2010
3 31 32 33 45 zip 00604 2010


I've tried a couple like this



>>> df.pivot(columns='s_name', values=["geo","zip","date","value"])

>>> df.pivot(columns='s_name', values=["geo","zip","date","value"]).unstack().reset_index()


I feel like I'm really close, but nothing seems to match up. This, for example, drops the extra non-s_name values:



>>> df.pivot(columns='name', values=["value"]).head()


How do I get the format I'm looking for?







pandas pandas-groupby






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 2 at 23:50







Mittenchops

















asked Jan 2 at 23:17









MittenchopsMittenchops

6,5712268141




6,5712268141








  • 1





    Does each group {'A002X', ....} have the same number of observations?

    – Daniel Mesejo
    Jan 3 at 0:13













  • It ought to. (But data is data....)

    – Mittenchops
    Jan 3 at 0:18














  • 1





    Does each group {'A002X', ....} have the same number of observations?

    – Daniel Mesejo
    Jan 3 at 0:13













  • It ought to. (But data is data....)

    – Mittenchops
    Jan 3 at 0:18








1




1





Does each group {'A002X', ....} have the same number of observations?

– Daniel Mesejo
Jan 3 at 0:13







Does each group {'A002X', ....} have the same number of observations?

– Daniel Mesejo
Jan 3 at 0:13















It ought to. (But data is data....)

– Mittenchops
Jan 3 at 0:18





It ought to. (But data is data....)

– Mittenchops
Jan 3 at 0:18












2 Answers
2






active

oldest

votes


















3














What I will do pivot_table



s=df.pivot_table(index=['geo','zip','date'],columns='s_name',values='value',aggfunc='sum')
s
s_name A002X A002Y A003X A003Y
geo zip date
zip 601 2010 1 2 3 4
602 2010 8 9 10 11
#s=s.reset_index()





share|improve this answer































    2














    It is necessary to set the dimensions as indices, and unstack the dimension that is desired as columns



    This leaves a multi-index in the final output



    Example:



    # with df as the following
    s_name geo zip date value
    0 A002X zip 601 2010 1
    1 A002Y zip 601 2010 2
    2 A003X zip 601 2010 3
    3 A003Y zip 601 2010 4

    ids = df.columns[:-1].tolist()
    df2 = df.set_index(ids).unstack(0)
    df2
    # outputs:
    value
    s_name A002X A002Y A003X A003Y
    geo zip date
    zip 601 2010 1 2 3 4


    Then flatten the multiindex, keeping the 2nd level:



    df2.columns = df2.columns.get_level_values(1)





    share|improve this answer


























    • The flatten gives the error: IndexError: Too many levels: Index has only 1 level, not 2 df or df2?

      – Mittenchops
      Jan 3 at 0:24






    • 2





      @Mittenchops I believe is df2.

      – Daniel Mesejo
      Jan 3 at 0:36











    • @Mittenchops, that was a typo in the last line as pointed out by Daniel Mesejo. The typo has been corrected.

      – Haleemur Ali
      Jan 3 at 0:47












    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54014430%2fmultiindex-pivot-in-pandas-long-to-wide-problem%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    3














    What I will do pivot_table



    s=df.pivot_table(index=['geo','zip','date'],columns='s_name',values='value',aggfunc='sum')
    s
    s_name A002X A002Y A003X A003Y
    geo zip date
    zip 601 2010 1 2 3 4
    602 2010 8 9 10 11
    #s=s.reset_index()





    share|improve this answer




























      3














      What I will do pivot_table



      s=df.pivot_table(index=['geo','zip','date'],columns='s_name',values='value',aggfunc='sum')
      s
      s_name A002X A002Y A003X A003Y
      geo zip date
      zip 601 2010 1 2 3 4
      602 2010 8 9 10 11
      #s=s.reset_index()





      share|improve this answer


























        3












        3








        3







        What I will do pivot_table



        s=df.pivot_table(index=['geo','zip','date'],columns='s_name',values='value',aggfunc='sum')
        s
        s_name A002X A002Y A003X A003Y
        geo zip date
        zip 601 2010 1 2 3 4
        602 2010 8 9 10 11
        #s=s.reset_index()





        share|improve this answer













        What I will do pivot_table



        s=df.pivot_table(index=['geo','zip','date'],columns='s_name',values='value',aggfunc='sum')
        s
        s_name A002X A002Y A003X A003Y
        geo zip date
        zip 601 2010 1 2 3 4
        602 2010 8 9 10 11
        #s=s.reset_index()






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Jan 3 at 0:46









        Wen-BenWen-Ben

        122k83671




        122k83671

























            2














            It is necessary to set the dimensions as indices, and unstack the dimension that is desired as columns



            This leaves a multi-index in the final output



            Example:



            # with df as the following
            s_name geo zip date value
            0 A002X zip 601 2010 1
            1 A002Y zip 601 2010 2
            2 A003X zip 601 2010 3
            3 A003Y zip 601 2010 4

            ids = df.columns[:-1].tolist()
            df2 = df.set_index(ids).unstack(0)
            df2
            # outputs:
            value
            s_name A002X A002Y A003X A003Y
            geo zip date
            zip 601 2010 1 2 3 4


            Then flatten the multiindex, keeping the 2nd level:



            df2.columns = df2.columns.get_level_values(1)





            share|improve this answer


























            • The flatten gives the error: IndexError: Too many levels: Index has only 1 level, not 2 df or df2?

              – Mittenchops
              Jan 3 at 0:24






            • 2





              @Mittenchops I believe is df2.

              – Daniel Mesejo
              Jan 3 at 0:36











            • @Mittenchops, that was a typo in the last line as pointed out by Daniel Mesejo. The typo has been corrected.

              – Haleemur Ali
              Jan 3 at 0:47
















            2














            It is necessary to set the dimensions as indices, and unstack the dimension that is desired as columns



            This leaves a multi-index in the final output



            Example:



            # with df as the following
            s_name geo zip date value
            0 A002X zip 601 2010 1
            1 A002Y zip 601 2010 2
            2 A003X zip 601 2010 3
            3 A003Y zip 601 2010 4

            ids = df.columns[:-1].tolist()
            df2 = df.set_index(ids).unstack(0)
            df2
            # outputs:
            value
            s_name A002X A002Y A003X A003Y
            geo zip date
            zip 601 2010 1 2 3 4


            Then flatten the multiindex, keeping the 2nd level:



            df2.columns = df2.columns.get_level_values(1)





            share|improve this answer


























            • The flatten gives the error: IndexError: Too many levels: Index has only 1 level, not 2 df or df2?

              – Mittenchops
              Jan 3 at 0:24






            • 2





              @Mittenchops I believe is df2.

              – Daniel Mesejo
              Jan 3 at 0:36











            • @Mittenchops, that was a typo in the last line as pointed out by Daniel Mesejo. The typo has been corrected.

              – Haleemur Ali
              Jan 3 at 0:47














            2












            2








            2







            It is necessary to set the dimensions as indices, and unstack the dimension that is desired as columns



            This leaves a multi-index in the final output



            Example:



            # with df as the following
            s_name geo zip date value
            0 A002X zip 601 2010 1
            1 A002Y zip 601 2010 2
            2 A003X zip 601 2010 3
            3 A003Y zip 601 2010 4

            ids = df.columns[:-1].tolist()
            df2 = df.set_index(ids).unstack(0)
            df2
            # outputs:
            value
            s_name A002X A002Y A003X A003Y
            geo zip date
            zip 601 2010 1 2 3 4


            Then flatten the multiindex, keeping the 2nd level:



            df2.columns = df2.columns.get_level_values(1)





            share|improve this answer















            It is necessary to set the dimensions as indices, and unstack the dimension that is desired as columns



            This leaves a multi-index in the final output



            Example:



            # with df as the following
            s_name geo zip date value
            0 A002X zip 601 2010 1
            1 A002Y zip 601 2010 2
            2 A003X zip 601 2010 3
            3 A003Y zip 601 2010 4

            ids = df.columns[:-1].tolist()
            df2 = df.set_index(ids).unstack(0)
            df2
            # outputs:
            value
            s_name A002X A002Y A003X A003Y
            geo zip date
            zip 601 2010 1 2 3 4


            Then flatten the multiindex, keeping the 2nd level:



            df2.columns = df2.columns.get_level_values(1)






            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Jan 3 at 0:46

























            answered Jan 3 at 0:18









            Haleemur AliHaleemur Ali

            12.7k21741




            12.7k21741













            • The flatten gives the error: IndexError: Too many levels: Index has only 1 level, not 2 df or df2?

              – Mittenchops
              Jan 3 at 0:24






            • 2





              @Mittenchops I believe is df2.

              – Daniel Mesejo
              Jan 3 at 0:36











            • @Mittenchops, that was a typo in the last line as pointed out by Daniel Mesejo. The typo has been corrected.

              – Haleemur Ali
              Jan 3 at 0:47



















            • The flatten gives the error: IndexError: Too many levels: Index has only 1 level, not 2 df or df2?

              – Mittenchops
              Jan 3 at 0:24






            • 2





              @Mittenchops I believe is df2.

              – Daniel Mesejo
              Jan 3 at 0:36











            • @Mittenchops, that was a typo in the last line as pointed out by Daniel Mesejo. The typo has been corrected.

              – Haleemur Ali
              Jan 3 at 0:47

















            The flatten gives the error: IndexError: Too many levels: Index has only 1 level, not 2 df or df2?

            – Mittenchops
            Jan 3 at 0:24





            The flatten gives the error: IndexError: Too many levels: Index has only 1 level, not 2 df or df2?

            – Mittenchops
            Jan 3 at 0:24




            2




            2





            @Mittenchops I believe is df2.

            – Daniel Mesejo
            Jan 3 at 0:36





            @Mittenchops I believe is df2.

            – Daniel Mesejo
            Jan 3 at 0:36













            @Mittenchops, that was a typo in the last line as pointed out by Daniel Mesejo. The typo has been corrected.

            – Haleemur Ali
            Jan 3 at 0:47





            @Mittenchops, that was a typo in the last line as pointed out by Daniel Mesejo. The typo has been corrected.

            – Haleemur Ali
            Jan 3 at 0:47


















            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54014430%2fmultiindex-pivot-in-pandas-long-to-wide-problem%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Can a sorcerer learn a 5th-level spell early by creating spell slots using the Font of Magic feature?

            Does disintegrating a polymorphed enemy still kill it after the 2018 errata?

            A Topological Invariant for $pi_3(U(n))$