MultiIndex pivot in pandas, long-to-wide problem

I have what I am sure is a simple long to wide problem, but I cannot get the syntax right on how to handle this when I have multiple columns at play.

My data looks like this:

>>> df.head()

        s_name   geo    zip  date value

0        A002X   zip  00601  2010     1

1        A002Y   zip  00601  2010     2

2        A003X   zip  00601  2010     3

3        A003Y   zip  00601  2010     4

...

.        A002X   zip  00602  2010     8

.        A002Y   zip  00602  2010     9

.        A003X   zip  00602  2010    10

.        A003Y   zip  00602  2010    11

I would like it to look like this, that is, the contents of the s_name column become new columns, and their values are the 'value' column, grouped by (geo,zip,date):

>>> df2.head()

         A002X A002Y A003X A003Y     geo    zip  date

0            1     2     3     4     zip  00601  2010

1            8     9    10    11     zip  00602  2010

2           19    20    21    22     zip  00603  2010

3           31    32    33    45     zip  00604  2010

I've tried a couple like this

>>> df.pivot(columns='s_name', values=["geo","zip","date","value"])



>>> df.pivot(columns='s_name', values=["geo","zip","date","value"]).unstack().reset_index()

I feel like I'm really close, but nothing seems to match up. This, for example, drops the extra non-s_name values:

>>> df.pivot(columns='name', values=["value"]).head()

How do I get the format I'm looking for?

edited Jan 2 at 23:50

asked Jan 2 at 23:17

Mittenchops

6,5712268141

1

Does each group {'A002X', ....} have the same number of observations?

– Daniel Mesejo
Jan 3 at 0:13

It ought to. (But data is data....)

– Mittenchops
Jan 3 at 0:18

add a comment |

I have what I am sure is a simple long to wide problem, but I cannot get the syntax right on how to handle this when I have multiple columns at play.

My data looks like this:

>>> df.head()

        s_name   geo    zip  date value

0        A002X   zip  00601  2010     1

1        A002Y   zip  00601  2010     2

2        A003X   zip  00601  2010     3

3        A003Y   zip  00601  2010     4

...

.        A002X   zip  00602  2010     8

.        A002Y   zip  00602  2010     9

.        A003X   zip  00602  2010    10

.        A003Y   zip  00602  2010    11

I would like it to look like this, that is, the contents of the s_name column become new columns, and their values are the 'value' column, grouped by (geo,zip,date):

>>> df2.head()

         A002X A002Y A003X A003Y     geo    zip  date

0            1     2     3     4     zip  00601  2010

1            8     9    10    11     zip  00602  2010

2           19    20    21    22     zip  00603  2010

3           31    32    33    45     zip  00604  2010

I've tried a couple like this

>>> df.pivot(columns='s_name', values=["geo","zip","date","value"])



>>> df.pivot(columns='s_name', values=["geo","zip","date","value"]).unstack().reset_index()

I feel like I'm really close, but nothing seems to match up. This, for example, drops the extra non-s_name values:

>>> df.pivot(columns='name', values=["value"]).head()

How do I get the format I'm looking for?

edited Jan 2 at 23:50

asked Jan 2 at 23:17

Mittenchops

6,5712268141

1

Does each group {'A002X', ....} have the same number of observations?

– Daniel Mesejo
Jan 3 at 0:13

It ought to. (But data is data....)

– Mittenchops
Jan 3 at 0:18

add a comment |

I have what I am sure is a simple long to wide problem, but I cannot get the syntax right on how to handle this when I have multiple columns at play.

My data looks like this:

>>> df.head()

        s_name   geo    zip  date value

0        A002X   zip  00601  2010     1

1        A002Y   zip  00601  2010     2

2        A003X   zip  00601  2010     3

3        A003Y   zip  00601  2010     4

...

.        A002X   zip  00602  2010     8

.        A002Y   zip  00602  2010     9

.        A003X   zip  00602  2010    10

.        A003Y   zip  00602  2010    11

I would like it to look like this, that is, the contents of the s_name column become new columns, and their values are the 'value' column, grouped by (geo,zip,date):

>>> df2.head()

         A002X A002Y A003X A003Y     geo    zip  date

0            1     2     3     4     zip  00601  2010

1            8     9    10    11     zip  00602  2010

2           19    20    21    22     zip  00603  2010

3           31    32    33    45     zip  00604  2010

I've tried a couple like this

>>> df.pivot(columns='s_name', values=["geo","zip","date","value"])



>>> df.pivot(columns='s_name', values=["geo","zip","date","value"]).unstack().reset_index()

I feel like I'm really close, but nothing seems to match up. This, for example, drops the extra non-s_name values:

>>> df.pivot(columns='name', values=["value"]).head()

How do I get the format I'm looking for?

edited Jan 2 at 23:50

asked Jan 2 at 23:17

Mittenchops

6,5712268141

I have what I am sure is a simple long to wide problem, but I cannot get the syntax right on how to handle this when I have multiple columns at play.

My data looks like this:

>>> df.head()

        s_name   geo    zip  date value

0        A002X   zip  00601  2010     1

1        A002Y   zip  00601  2010     2

2        A003X   zip  00601  2010     3

3        A003Y   zip  00601  2010     4

...

.        A002X   zip  00602  2010     8

.        A002Y   zip  00602  2010     9

.        A003X   zip  00602  2010    10

.        A003Y   zip  00602  2010    11

I would like it to look like this, that is, the contents of the s_name column become new columns, and their values are the 'value' column, grouped by (geo,zip,date):

>>> df2.head()

         A002X A002Y A003X A003Y     geo    zip  date

0            1     2     3     4     zip  00601  2010

1            8     9    10    11     zip  00602  2010

2           19    20    21    22     zip  00603  2010

3           31    32    33    45     zip  00604  2010

I've tried a couple like this

>>> df.pivot(columns='s_name', values=["geo","zip","date","value"])



>>> df.pivot(columns='s_name', values=["geo","zip","date","value"]).unstack().reset_index()

I feel like I'm really close, but nothing seems to match up. This, for example, drops the extra non-s_name values:

>>> df.pivot(columns='name', values=["value"]).head()

How do I get the format I'm looking for?

pandas pandas-groupby

edited Jan 2 at 23:50

asked Jan 2 at 23:17

Mittenchops

6,5712268141

edited Jan 2 at 23:50

asked Jan 2 at 23:17

Mittenchops

6,5712268141

edited Jan 2 at 23:50

asked Jan 2 at 23:17

Mittenchops

6,5712268141

asked Jan 2 at 23:17

Mittenchops

6,5712268141

asked Jan 2 at 23:17

Mittenchops

6,5712268141

1

Does each group {'A002X', ....} have the same number of observations?

– Daniel Mesejo
Jan 3 at 0:13

It ought to. (But data is data....)

– Mittenchops
Jan 3 at 0:18

add a comment |

1

Does each group {'A002X', ....} have the same number of observations?

– Daniel Mesejo
Jan 3 at 0:13

It ought to. (But data is data....)

– Mittenchops
Jan 3 at 0:18

Does each group {'A002X', ....} have the same number of observations?

– Daniel Mesejo
Jan 3 at 0:13

It ought to. (But data is data....)

– Mittenchops
Jan 3 at 0:18

add a comment |

2 Answers
2

active

oldest

votes

What I will do pivot_table

s=df.pivot_table(index=['geo','zip','date'],columns='s_name',values='value',aggfunc='sum')

s

s_name        A002X  A002Y  A003X  A003Y

geo zip date                            

zip 601 2010      1      2      3      4

    602 2010      8      9     10     11

#s=s.reset_index()

answered Jan 3 at 0:46

Wen-Ben

122k83671

add a comment |

It is necessary to set the dimensions as indices, and unstack the dimension that is desired as columns

This leaves a multi-index in the final output

Example:

# with df as the following

  s_name  geo  zip  date  value

0  A002X  zip  601  2010      1

1  A002Y  zip  601  2010      2

2  A003X  zip  601  2010      3

3  A003Y  zip  601  2010      4



ids = df.columns[:-1].tolist()

df2 = df.set_index(ids).unstack(0)

df2

# outputs:

             value                  

s_name       A002X A002Y A003X A003Y

geo zip date                        

zip 601 2010     1     2     3     4

Then flatten the multiindex, keeping the 2nd level:

df2.columns = df2.columns.get_level_values(1)

edited Jan 3 at 0:46

answered Jan 3 at 0:18

Haleemur Ali

12.7k21741

The flatten gives the error: IndexError: Too many levels: Index has only 1 level, not 2 df or df2?

– Mittenchops
Jan 3 at 0:24

2

@Mittenchops I believe is df2.

– Daniel Mesejo
Jan 3 at 0:36

@Mittenchops, that was a typo in the last line as pointed out by Daniel Mesejo. The typo has been corrected.

– Haleemur Ali
Jan 3 at 0:47

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54014430%2fmultiindex-pivot-in-pandas-long-to-wide-problem%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

What I will do pivot_table

s=df.pivot_table(index=['geo','zip','date'],columns='s_name',values='value',aggfunc='sum')

s

s_name        A002X  A002Y  A003X  A003Y

geo zip date                            

zip 601 2010      1      2      3      4

    602 2010      8      9     10     11

#s=s.reset_index()

answered Jan 3 at 0:46

Wen-Ben

122k83671

add a comment |

What I will do pivot_table

s=df.pivot_table(index=['geo','zip','date'],columns='s_name',values='value',aggfunc='sum')

s

s_name        A002X  A002Y  A003X  A003Y

geo zip date                            

zip 601 2010      1      2      3      4

    602 2010      8      9     10     11

#s=s.reset_index()

answered Jan 3 at 0:46

Wen-Ben

122k83671

add a comment |

What I will do pivot_table

s=df.pivot_table(index=['geo','zip','date'],columns='s_name',values='value',aggfunc='sum')

s

s_name        A002X  A002Y  A003X  A003Y

geo zip date                            

zip 601 2010      1      2      3      4

    602 2010      8      9     10     11

#s=s.reset_index()

answered Jan 3 at 0:46

Wen-Ben

122k83671

What I will do pivot_table

s=df.pivot_table(index=['geo','zip','date'],columns='s_name',values='value',aggfunc='sum')

s

s_name        A002X  A002Y  A003X  A003Y

geo zip date                            

zip 601 2010      1      2      3      4

    602 2010      8      9     10     11

#s=s.reset_index()

answered Jan 3 at 0:46

Wen-Ben

122k83671

answered Jan 3 at 0:46

Wen-Ben

122k83671

answered Jan 3 at 0:46

Wen-Ben

122k83671

answered Jan 3 at 0:46

Wen-Ben

122k83671

add a comment |

It is necessary to set the dimensions as indices, and unstack the dimension that is desired as columns

This leaves a multi-index in the final output

Example:

# with df as the following

  s_name  geo  zip  date  value

0  A002X  zip  601  2010      1

1  A002Y  zip  601  2010      2

2  A003X  zip  601  2010      3

3  A003Y  zip  601  2010      4



ids = df.columns[:-1].tolist()

df2 = df.set_index(ids).unstack(0)

df2

# outputs:

             value                  

s_name       A002X A002Y A003X A003Y

geo zip date                        

zip 601 2010     1     2     3     4

Then flatten the multiindex, keeping the 2nd level:

df2.columns = df2.columns.get_level_values(1)

edited Jan 3 at 0:46

answered Jan 3 at 0:18

Haleemur Ali

12.7k21741

The flatten gives the error: IndexError: Too many levels: Index has only 1 level, not 2 df or df2?

– Mittenchops
Jan 3 at 0:24

2

@Mittenchops I believe is df2.

– Daniel Mesejo
Jan 3 at 0:36

@Mittenchops, that was a typo in the last line as pointed out by Daniel Mesejo. The typo has been corrected.

– Haleemur Ali
Jan 3 at 0:47

add a comment |

It is necessary to set the dimensions as indices, and unstack the dimension that is desired as columns

This leaves a multi-index in the final output

Example:

# with df as the following

  s_name  geo  zip  date  value

0  A002X  zip  601  2010      1

1  A002Y  zip  601  2010      2

2  A003X  zip  601  2010      3

3  A003Y  zip  601  2010      4



ids = df.columns[:-1].tolist()

df2 = df.set_index(ids).unstack(0)

df2

# outputs:

             value                  

s_name       A002X A002Y A003X A003Y

geo zip date                        

zip 601 2010     1     2     3     4

Then flatten the multiindex, keeping the 2nd level:

df2.columns = df2.columns.get_level_values(1)

edited Jan 3 at 0:46

answered Jan 3 at 0:18

Haleemur Ali

12.7k21741

The flatten gives the error: IndexError: Too many levels: Index has only 1 level, not 2 df or df2?

– Mittenchops
Jan 3 at 0:24

2

@Mittenchops I believe is df2.

– Daniel Mesejo
Jan 3 at 0:36

@Mittenchops, that was a typo in the last line as pointed out by Daniel Mesejo. The typo has been corrected.

– Haleemur Ali
Jan 3 at 0:47

add a comment |

It is necessary to set the dimensions as indices, and unstack the dimension that is desired as columns

This leaves a multi-index in the final output

Example:

# with df as the following

  s_name  geo  zip  date  value

0  A002X  zip  601  2010      1

1  A002Y  zip  601  2010      2

2  A003X  zip  601  2010      3

3  A003Y  zip  601  2010      4



ids = df.columns[:-1].tolist()

df2 = df.set_index(ids).unstack(0)

df2

# outputs:

             value                  

s_name       A002X A002Y A003X A003Y

geo zip date                        

zip 601 2010     1     2     3     4

Then flatten the multiindex, keeping the 2nd level:

df2.columns = df2.columns.get_level_values(1)

edited Jan 3 at 0:46

answered Jan 3 at 0:18

Haleemur Ali

12.7k21741

It is necessary to set the dimensions as indices, and unstack the dimension that is desired as columns

This leaves a multi-index in the final output

Example:

# with df as the following

  s_name  geo  zip  date  value

0  A002X  zip  601  2010      1

1  A002Y  zip  601  2010      2

2  A003X  zip  601  2010      3

3  A003Y  zip  601  2010      4



ids = df.columns[:-1].tolist()

df2 = df.set_index(ids).unstack(0)

df2

# outputs:

             value                  

s_name       A002X A002Y A003X A003Y

geo zip date                        

zip 601 2010     1     2     3     4

Then flatten the multiindex, keeping the 2nd level:

df2.columns = df2.columns.get_level_values(1)

edited Jan 3 at 0:46

answered Jan 3 at 0:18

Haleemur Ali

12.7k21741

edited Jan 3 at 0:46

answered Jan 3 at 0:18

Haleemur Ali

12.7k21741

answered Jan 3 at 0:18

Haleemur Ali

12.7k21741

answered Jan 3 at 0:18

Haleemur Ali

12.7k21741

The flatten gives the error: IndexError: Too many levels: Index has only 1 level, not 2 df or df2?

– Mittenchops
Jan 3 at 0:24

2

@Mittenchops I believe is df2.

– Daniel Mesejo
Jan 3 at 0:36

@Mittenchops, that was a typo in the last line as pointed out by Daniel Mesejo. The typo has been corrected.

– Haleemur Ali
Jan 3 at 0:47

add a comment |

The flatten gives the error: IndexError: Too many levels: Index has only 1 level, not 2 df or df2?

– Mittenchops
Jan 3 at 0:24

2

@Mittenchops I believe is df2.

– Daniel Mesejo
Jan 3 at 0:36

@Mittenchops, that was a typo in the last line as pointed out by Daniel Mesejo. The typo has been corrected.

– Haleemur Ali
Jan 3 at 0:47

The flatten gives the error: IndexError: Too many levels: Index has only 1 level, not 2 df or df2?

– Mittenchops
Jan 3 at 0:24

@Mittenchops I believe is df2.

– Daniel Mesejo
Jan 3 at 0:36

@Mittenchops, that was a typo in the last line as pointed out by Daniel Mesejo. The typo has been corrected.

– Haleemur Ali
Jan 3 at 0:47

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

Search This Blog

Ufyukyu