How do I create a sum row and sum column in pandas?

I'm going through the Khan Academy course on Statistics as a bit of a refresher from my college days, and as a way to get me up to speed on pandas & other scientific Python.

I've got a table that looks like this from Khan Academy:

             | Undergraduate | Graduate | Total

-------------+---------------+----------+------

Straight A's |           240 |       60 |   300

-------------+---------------+----------+------

Not          |         3,760 |      440 | 4,200

-------------+---------------+----------+------

Total        |         4,000 |      500 | 4,500

I would like to recreate this table using pandas. Of course I could create a DataFrame using something like

"Graduate": {...},

"Undergraduate": {...},

"Total": {...},

But that seems like a naive approach that would both fall over quickly and just not really be extensible.

I've got the non-totals part of the table like this:

df = pd.DataFrame(

    {

        "Undergraduate": {"Straight A's": 240, "Not": 3_760},

        "Graduate": {"Straight A's": 60, "Not": 440},

    }

)

df

I've been looking and found a couple of promising things, like:

df['Total'] = df.sum(axis=1)

But I didn't find anything terribly elegant.

I did find the crosstab function that looks like it should do what I want, but it seems like in order to do that I'd have to create a dataframe consisting of 1/0 for all of these values, which seems silly because I've already got an aggregate.

I have found some approaches that seem to manually build a new totals row, but it seems like there should be a better way, something like:

totals(df, rows=True, columns=True)

or something.

Does this exist in pandas, or do I have to just cobble together my own approach?

asked Nov 21 '18 at 15:07

Wayne Werner

26.7k14110193

add a comment |

I'm going through the Khan Academy course on Statistics as a bit of a refresher from my college days, and as a way to get me up to speed on pandas & other scientific Python.

I've got a table that looks like this from Khan Academy:

             | Undergraduate | Graduate | Total

-------------+---------------+----------+------

Straight A's |           240 |       60 |   300

-------------+---------------+----------+------

Not          |         3,760 |      440 | 4,200

-------------+---------------+----------+------

Total        |         4,000 |      500 | 4,500

I would like to recreate this table using pandas. Of course I could create a DataFrame using something like

"Graduate": {...},

"Undergraduate": {...},

"Total": {...},

But that seems like a naive approach that would both fall over quickly and just not really be extensible.

I've got the non-totals part of the table like this:

df = pd.DataFrame(

    {

        "Undergraduate": {"Straight A's": 240, "Not": 3_760},

        "Graduate": {"Straight A's": 60, "Not": 440},

    }

)

df

I've been looking and found a couple of promising things, like:

df['Total'] = df.sum(axis=1)

But I didn't find anything terribly elegant.

I have found some approaches that seem to manually build a new totals row, but it seems like there should be a better way, something like:

totals(df, rows=True, columns=True)

or something.

Does this exist in pandas, or do I have to just cobble together my own approach?

asked Nov 21 '18 at 15:07

Wayne Werner

26.7k14110193

add a comment |

I'm going through the Khan Academy course on Statistics as a bit of a refresher from my college days, and as a way to get me up to speed on pandas & other scientific Python.

I've got a table that looks like this from Khan Academy:

             | Undergraduate | Graduate | Total

-------------+---------------+----------+------

Straight A's |           240 |       60 |   300

-------------+---------------+----------+------

Not          |         3,760 |      440 | 4,200

-------------+---------------+----------+------

Total        |         4,000 |      500 | 4,500

I would like to recreate this table using pandas. Of course I could create a DataFrame using something like

"Graduate": {...},

"Undergraduate": {...},

"Total": {...},

But that seems like a naive approach that would both fall over quickly and just not really be extensible.

I've got the non-totals part of the table like this:

df = pd.DataFrame(

    {

        "Undergraduate": {"Straight A's": 240, "Not": 3_760},

        "Graduate": {"Straight A's": 60, "Not": 440},

    }

)

df

I've been looking and found a couple of promising things, like:

df['Total'] = df.sum(axis=1)

But I didn't find anything terribly elegant.

I have found some approaches that seem to manually build a new totals row, but it seems like there should be a better way, something like:

totals(df, rows=True, columns=True)

or something.

Does this exist in pandas, or do I have to just cobble together my own approach?

asked Nov 21 '18 at 15:07

Wayne Werner

26.7k14110193

I'm going through the Khan Academy course on Statistics as a bit of a refresher from my college days, and as a way to get me up to speed on pandas & other scientific Python.

I've got a table that looks like this from Khan Academy:

             | Undergraduate | Graduate | Total

-------------+---------------+----------+------

Straight A's |           240 |       60 |   300

-------------+---------------+----------+------

Not          |         3,760 |      440 | 4,200

-------------+---------------+----------+------

Total        |         4,000 |      500 | 4,500

I would like to recreate this table using pandas. Of course I could create a DataFrame using something like

"Graduate": {...},

"Undergraduate": {...},

"Total": {...},

But that seems like a naive approach that would both fall over quickly and just not really be extensible.

I've got the non-totals part of the table like this:

df = pd.DataFrame(

    {

        "Undergraduate": {"Straight A's": 240, "Not": 3_760},

        "Graduate": {"Straight A's": 60, "Not": 440},

    }

)

df

I've been looking and found a couple of promising things, like:

df['Total'] = df.sum(axis=1)

But I didn't find anything terribly elegant.

I have found some approaches that seem to manually build a new totals row, but it seems like there should be a better way, something like:

totals(df, rows=True, columns=True)

or something.

Does this exist in pandas, or do I have to just cobble together my own approach?

python pandas

asked Nov 21 '18 at 15:07

Wayne Werner

26.7k14110193

asked Nov 21 '18 at 15:07

Wayne Werner

26.7k14110193

asked Nov 21 '18 at 15:07

Wayne Werner

26.7k14110193

asked Nov 21 '18 at 15:07

Wayne Werner

26.7k14110193

asked Nov 21 '18 at 15:07

Wayne Werner

26.7k14110193

add a comment |

4 Answers
4

active

oldest

votes

Or in two steps, using the .sum() function as you suggested (which might be a bit more readable as well):

import pandas as pd



df = pd.DataFrame( {"Undergraduate": {"Straight A's": 240, "Not": 3_760},"Graduate": {"Straight A's": 60, "Not": 440},})



#Total sum per row: 

df.loc['Total',:]= df.sum(axis=0)



#Total sum per column: 

df.loc[:,'Total'] = df.sum(axis=1)

Output:

              Graduate  Undergraduate  Total

Not                440           3760   4200

Straight A's        60            240    300

Total              500           4000   4500

edited Dec 5 '18 at 9:46

answered Nov 21 '18 at 15:12

Archie

556722

Huh... this is giving me some weird output though - 3760+440 isn't 8400, but that's what it's showing??
– Wayne Werner
Nov 21 '18 at 15:20

That's weird, I get 4200 as it is supposed to? Maybe a typo?
– Archie
Nov 21 '18 at 15:22

5

@WayneWerner that is because this is an in place operation. It seems you've run it twice
– piRSquared
Nov 21 '18 at 15:23

Ah, I must have accidentally hit ctrl+enter in my notebook. This time I made a copy to operate on :)
– Wayne Werner
Nov 21 '18 at 15:27

add a comment |

`append` and `assign`

The point of this answer is to provide an in line and not an in place solution.

`append`

I use append to stack a Series or DataFrame vertically. It also creates a copy so that I can continue to chain.

`assign`

I use assign to add a column. However, the DataFrame I'm working on is in the in between nether space. So I use a lambda in the assign argument which tells Pandas to apply it to the calling DataFrame.

df.append(df.sum().rename('Total')).assign(Total=lambda d: d.sum(1))



              Graduate  Undergraduate  Total

Not                440           3760   4200

Straight A's        60            240    300

Total              500           4000   4500

Fun alternative

Uses drop with errors='ignore' to get rid of potentially pre-existing Total rows and columns.

Also, still in line.

def tc(d):

  return d.assign(Total=d.drop('Total', errors='ignore', axis=1).sum(1))



df.pipe(tc).T.pipe(tc).T



              Graduate  Undergraduate  Total

Not                440           3760   4200

Straight A's        60            240    300

Total              500           4000   4500

edited Nov 21 '18 at 15:27

answered Nov 21 '18 at 15:09

piRSquared

152k22144286

add a comment |

From the original data using crosstab, if just base on your input, you just need melt before crosstab

s=df.reset_index().melt('index')

pd.crosstab(index=s['index'],columns=s.variable,values=s.value,aggfunc='sum',margins=True)

Out[33]: 

variable      Graduate  Undergraduate   All

index                                      

Not                440           3760  4200

Straight A's        60            240   300

All                500           4000  4500

Toy data

df=pd.DataFrame({'c1':[1,2,2,3,4],'c2':[2,2,3,3,3],'c3':[1,2,3,4,5]}) 

# before `agg`, I think your input is the result after `groupby` 

df

Out[37]: 

   c1  c2  c3

0   1   2   1

1   2   2   2

2   2   3   3

3   3   3   4

4   4   3   5





pd.crosstab(df.c1,df.c2,df.c3,aggfunc='sum',margins

=True)

Out[38]: 

c2     2     3  All

c1                 

1    1.0   NaN    1

2    2.0   3.0    5

3    NaN   4.0    4

4    NaN   5.0    5

All  3.0  12.0   15

edited Nov 21 '18 at 15:21

answered Nov 21 '18 at 15:16

W-B

102k73163

add a comment |

The original data is:

>>> df = pd.DataFrame(dict(Undergraduate=[240, 3760], Graduate=[60, 440]), index=["Straight A's", "Not"])

>>> df

Out: 

              Graduate  Undergraduate

Straight A's        60            240

Not                440           3760

You can only use df.T to achieve recreating this table:

>>> df_new = df.T

>>> df_new

Out: 

               Straight A's   Not

Graduate                 60   440

Undergraduate           240  3760

After computing the Total by row and columns:

>>> df_new.loc['Total',:]= df_new.sum(axis=0)

>>> df_new.loc[:,'Total'] = df_new.sum(axis=1)

>>> df_new

Out: 

               Straight A's     Not   Total

Graduate               60.0   440.0   500.0

Undergraduate         240.0  3760.0  4000.0

Total                 300.0  4200.0  4500.0

answered Nov 30 '18 at 2:52

TimeSeam

1815

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53414960%2fhow-do-i-create-a-sum-row-and-sum-column-in-pandas%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

Or in two steps, using the .sum() function as you suggested (which might be a bit more readable as well):

import pandas as pd



df = pd.DataFrame( {"Undergraduate": {"Straight A's": 240, "Not": 3_760},"Graduate": {"Straight A's": 60, "Not": 440},})



#Total sum per row: 

df.loc['Total',:]= df.sum(axis=0)



#Total sum per column: 

df.loc[:,'Total'] = df.sum(axis=1)

Output:

              Graduate  Undergraduate  Total

Not                440           3760   4200

Straight A's        60            240    300

Total              500           4000   4500

edited Dec 5 '18 at 9:46

answered Nov 21 '18 at 15:12

Archie

556722

Huh... this is giving me some weird output though - 3760+440 isn't 8400, but that's what it's showing??
– Wayne Werner
Nov 21 '18 at 15:20

That's weird, I get 4200 as it is supposed to? Maybe a typo?
– Archie
Nov 21 '18 at 15:22

5

@WayneWerner that is because this is an in place operation. It seems you've run it twice
– piRSquared
Nov 21 '18 at 15:23

Ah, I must have accidentally hit ctrl+enter in my notebook. This time I made a copy to operate on :)
– Wayne Werner
Nov 21 '18 at 15:27

add a comment |

Or in two steps, using the .sum() function as you suggested (which might be a bit more readable as well):

import pandas as pd



df = pd.DataFrame( {"Undergraduate": {"Straight A's": 240, "Not": 3_760},"Graduate": {"Straight A's": 60, "Not": 440},})



#Total sum per row: 

df.loc['Total',:]= df.sum(axis=0)



#Total sum per column: 

df.loc[:,'Total'] = df.sum(axis=1)

Output:

              Graduate  Undergraduate  Total

Not                440           3760   4200

Straight A's        60            240    300

Total              500           4000   4500

edited Dec 5 '18 at 9:46

answered Nov 21 '18 at 15:12

Archie

556722

Huh... this is giving me some weird output though - 3760+440 isn't 8400, but that's what it's showing??
– Wayne Werner
Nov 21 '18 at 15:20

That's weird, I get 4200 as it is supposed to? Maybe a typo?
– Archie
Nov 21 '18 at 15:22

5

@WayneWerner that is because this is an in place operation. It seems you've run it twice
– piRSquared
Nov 21 '18 at 15:23

Ah, I must have accidentally hit ctrl+enter in my notebook. This time I made a copy to operate on :)
– Wayne Werner
Nov 21 '18 at 15:27

add a comment |

Or in two steps, using the .sum() function as you suggested (which might be a bit more readable as well):

import pandas as pd



df = pd.DataFrame( {"Undergraduate": {"Straight A's": 240, "Not": 3_760},"Graduate": {"Straight A's": 60, "Not": 440},})



#Total sum per row: 

df.loc['Total',:]= df.sum(axis=0)



#Total sum per column: 

df.loc[:,'Total'] = df.sum(axis=1)

Output:

              Graduate  Undergraduate  Total

Not                440           3760   4200

Straight A's        60            240    300

Total              500           4000   4500

edited Dec 5 '18 at 9:46

answered Nov 21 '18 at 15:12

Archie

556722

Or in two steps, using the .sum() function as you suggested (which might be a bit more readable as well):

import pandas as pd



df = pd.DataFrame( {"Undergraduate": {"Straight A's": 240, "Not": 3_760},"Graduate": {"Straight A's": 60, "Not": 440},})



#Total sum per row: 

df.loc['Total',:]= df.sum(axis=0)



#Total sum per column: 

df.loc[:,'Total'] = df.sum(axis=1)

Output:

              Graduate  Undergraduate  Total

Not                440           3760   4200

Straight A's        60            240    300

Total              500           4000   4500

edited Dec 5 '18 at 9:46

answered Nov 21 '18 at 15:12

Archie

556722

edited Dec 5 '18 at 9:46

answered Nov 21 '18 at 15:12

Archie

556722

answered Nov 21 '18 at 15:12

Archie

556722

answered Nov 21 '18 at 15:12

Archie

556722

Huh... this is giving me some weird output though - 3760+440 isn't 8400, but that's what it's showing??
– Wayne Werner
Nov 21 '18 at 15:20

That's weird, I get 4200 as it is supposed to? Maybe a typo?
– Archie
Nov 21 '18 at 15:22

5

@WayneWerner that is because this is an in place operation. It seems you've run it twice
– piRSquared
Nov 21 '18 at 15:23

Ah, I must have accidentally hit ctrl+enter in my notebook. This time I made a copy to operate on :)
– Wayne Werner
Nov 21 '18 at 15:27

add a comment |

Huh... this is giving me some weird output though - 3760+440 isn't 8400, but that's what it's showing??
– Wayne Werner
Nov 21 '18 at 15:20

That's weird, I get 4200 as it is supposed to? Maybe a typo?
– Archie
Nov 21 '18 at 15:22

5

@WayneWerner that is because this is an in place operation. It seems you've run it twice
– piRSquared
Nov 21 '18 at 15:23

Ah, I must have accidentally hit ctrl+enter in my notebook. This time I made a copy to operate on :)
– Wayne Werner
Nov 21 '18 at 15:27

Huh... this is giving me some weird output though - 3760+440 isn't 8400, but that's what it's showing??
– Wayne Werner
Nov 21 '18 at 15:20

That's weird, I get 4200 as it is supposed to? Maybe a typo?
– Archie
Nov 21 '18 at 15:22

@WayneWerner that is because this is an in place operation. It seems you've run it twice
– piRSquared
Nov 21 '18 at 15:23

Ah, I must have accidentally hit ctrl+enter in my notebook. This time I made a copy to operate on :)
– Wayne Werner
Nov 21 '18 at 15:27

add a comment |

`append` and `assign`

The point of this answer is to provide an in line and not an in place solution.

`append`

I use append to stack a Series or DataFrame vertically. It also creates a copy so that I can continue to chain.

`assign`

df.append(df.sum().rename('Total')).assign(Total=lambda d: d.sum(1))



              Graduate  Undergraduate  Total

Not                440           3760   4200

Straight A's        60            240    300

Total              500           4000   4500

Fun alternative

Uses drop with errors='ignore' to get rid of potentially pre-existing Total rows and columns.

Also, still in line.

def tc(d):

  return d.assign(Total=d.drop('Total', errors='ignore', axis=1).sum(1))



df.pipe(tc).T.pipe(tc).T



              Graduate  Undergraduate  Total

Not                440           3760   4200

Straight A's        60            240    300

Total              500           4000   4500

edited Nov 21 '18 at 15:27

answered Nov 21 '18 at 15:09

piRSquared

152k22144286

add a comment |

`append` and `assign`

The point of this answer is to provide an in line and not an in place solution.

`append`

I use append to stack a Series or DataFrame vertically. It also creates a copy so that I can continue to chain.

`assign`

df.append(df.sum().rename('Total')).assign(Total=lambda d: d.sum(1))



              Graduate  Undergraduate  Total

Not                440           3760   4200

Straight A's        60            240    300

Total              500           4000   4500

Fun alternative

Uses drop with errors='ignore' to get rid of potentially pre-existing Total rows and columns.

Also, still in line.

def tc(d):

  return d.assign(Total=d.drop('Total', errors='ignore', axis=1).sum(1))



df.pipe(tc).T.pipe(tc).T



              Graduate  Undergraduate  Total

Not                440           3760   4200

Straight A's        60            240    300

Total              500           4000   4500

edited Nov 21 '18 at 15:27

answered Nov 21 '18 at 15:09

piRSquared

152k22144286

add a comment |

`append` and `assign`

The point of this answer is to provide an in line and not an in place solution.

`append`

I use append to stack a Series or DataFrame vertically. It also creates a copy so that I can continue to chain.

`assign`

df.append(df.sum().rename('Total')).assign(Total=lambda d: d.sum(1))



              Graduate  Undergraduate  Total

Not                440           3760   4200

Straight A's        60            240    300

Total              500           4000   4500

Fun alternative

Uses drop with errors='ignore' to get rid of potentially pre-existing Total rows and columns.

Also, still in line.

def tc(d):

  return d.assign(Total=d.drop('Total', errors='ignore', axis=1).sum(1))



df.pipe(tc).T.pipe(tc).T



              Graduate  Undergraduate  Total

Not                440           3760   4200

Straight A's        60            240    300

Total              500           4000   4500

edited Nov 21 '18 at 15:27

answered Nov 21 '18 at 15:09

piRSquared

152k22144286

`append` and `assign`

The point of this answer is to provide an in line and not an in place solution.

`append`

I use append to stack a Series or DataFrame vertically. It also creates a copy so that I can continue to chain.

`assign`

df.append(df.sum().rename('Total')).assign(Total=lambda d: d.sum(1))



              Graduate  Undergraduate  Total

Not                440           3760   4200

Straight A's        60            240    300

Total              500           4000   4500

Fun alternative

Uses drop with errors='ignore' to get rid of potentially pre-existing Total rows and columns.

Also, still in line.

def tc(d):

  return d.assign(Total=d.drop('Total', errors='ignore', axis=1).sum(1))



df.pipe(tc).T.pipe(tc).T



              Graduate  Undergraduate  Total

Not                440           3760   4200

Straight A's        60            240    300

Total              500           4000   4500

edited Nov 21 '18 at 15:27

answered Nov 21 '18 at 15:09

piRSquared

152k22144286

edited Nov 21 '18 at 15:27

answered Nov 21 '18 at 15:09

piRSquared

152k22144286

answered Nov 21 '18 at 15:09

piRSquared

152k22144286

answered Nov 21 '18 at 15:09

piRSquared

152k22144286

add a comment |

From the original data using crosstab, if just base on your input, you just need melt before crosstab

s=df.reset_index().melt('index')

pd.crosstab(index=s['index'],columns=s.variable,values=s.value,aggfunc='sum',margins=True)

Out[33]: 

variable      Graduate  Undergraduate   All

index                                      

Not                440           3760  4200

Straight A's        60            240   300

All                500           4000  4500

Toy data

df=pd.DataFrame({'c1':[1,2,2,3,4],'c2':[2,2,3,3,3],'c3':[1,2,3,4,5]}) 

# before `agg`, I think your input is the result after `groupby` 

df

Out[37]: 

   c1  c2  c3

0   1   2   1

1   2   2   2

2   2   3   3

3   3   3   4

4   4   3   5





pd.crosstab(df.c1,df.c2,df.c3,aggfunc='sum',margins

=True)

Out[38]: 

c2     2     3  All

c1                 

1    1.0   NaN    1

2    2.0   3.0    5

3    NaN   4.0    4

4    NaN   5.0    5

All  3.0  12.0   15

edited Nov 21 '18 at 15:21

answered Nov 21 '18 at 15:16

W-B

102k73163

add a comment |

From the original data using crosstab, if just base on your input, you just need melt before crosstab

s=df.reset_index().melt('index')

pd.crosstab(index=s['index'],columns=s.variable,values=s.value,aggfunc='sum',margins=True)

Out[33]: 

variable      Graduate  Undergraduate   All

index                                      

Not                440           3760  4200

Straight A's        60            240   300

All                500           4000  4500

Toy data

df=pd.DataFrame({'c1':[1,2,2,3,4],'c2':[2,2,3,3,3],'c3':[1,2,3,4,5]}) 

# before `agg`, I think your input is the result after `groupby` 

df

Out[37]: 

   c1  c2  c3

0   1   2   1

1   2   2   2

2   2   3   3

3   3   3   4

4   4   3   5





pd.crosstab(df.c1,df.c2,df.c3,aggfunc='sum',margins

=True)

Out[38]: 

c2     2     3  All

c1                 

1    1.0   NaN    1

2    2.0   3.0    5

3    NaN   4.0    4

4    NaN   5.0    5

All  3.0  12.0   15

edited Nov 21 '18 at 15:21

answered Nov 21 '18 at 15:16

W-B

102k73163

add a comment |

From the original data using crosstab, if just base on your input, you just need melt before crosstab

s=df.reset_index().melt('index')

pd.crosstab(index=s['index'],columns=s.variable,values=s.value,aggfunc='sum',margins=True)

Out[33]: 

variable      Graduate  Undergraduate   All

index                                      

Not                440           3760  4200

Straight A's        60            240   300

All                500           4000  4500

Toy data

df=pd.DataFrame({'c1':[1,2,2,3,4],'c2':[2,2,3,3,3],'c3':[1,2,3,4,5]}) 

# before `agg`, I think your input is the result after `groupby` 

df

Out[37]: 

   c1  c2  c3

0   1   2   1

1   2   2   2

2   2   3   3

3   3   3   4

4   4   3   5





pd.crosstab(df.c1,df.c2,df.c3,aggfunc='sum',margins

=True)

Out[38]: 

c2     2     3  All

c1                 

1    1.0   NaN    1

2    2.0   3.0    5

3    NaN   4.0    4

4    NaN   5.0    5

All  3.0  12.0   15

edited Nov 21 '18 at 15:21

answered Nov 21 '18 at 15:16

W-B

102k73163

From the original data using crosstab, if just base on your input, you just need melt before crosstab

s=df.reset_index().melt('index')

pd.crosstab(index=s['index'],columns=s.variable,values=s.value,aggfunc='sum',margins=True)

Out[33]: 

variable      Graduate  Undergraduate   All

index                                      

Not                440           3760  4200

Straight A's        60            240   300

All                500           4000  4500

Toy data

df=pd.DataFrame({'c1':[1,2,2,3,4],'c2':[2,2,3,3,3],'c3':[1,2,3,4,5]}) 

# before `agg`, I think your input is the result after `groupby` 

df

Out[37]: 

   c1  c2  c3

0   1   2   1

1   2   2   2

2   2   3   3

3   3   3   4

4   4   3   5





pd.crosstab(df.c1,df.c2,df.c3,aggfunc='sum',margins

=True)

Out[38]: 

c2     2     3  All

c1                 

1    1.0   NaN    1

2    2.0   3.0    5

3    NaN   4.0    4

4    NaN   5.0    5

All  3.0  12.0   15

edited Nov 21 '18 at 15:21

answered Nov 21 '18 at 15:16

W-B

102k73163

edited Nov 21 '18 at 15:21

answered Nov 21 '18 at 15:16

W-B

102k73163

answered Nov 21 '18 at 15:16

W-B

102k73163

answered Nov 21 '18 at 15:16

W-B

102k73163

add a comment |

The original data is:

>>> df = pd.DataFrame(dict(Undergraduate=[240, 3760], Graduate=[60, 440]), index=["Straight A's", "Not"])

>>> df

Out: 

              Graduate  Undergraduate

Straight A's        60            240

Not                440           3760

You can only use df.T to achieve recreating this table:

>>> df_new = df.T

>>> df_new

Out: 

               Straight A's   Not

Graduate                 60   440

Undergraduate           240  3760

After computing the Total by row and columns:

>>> df_new.loc['Total',:]= df_new.sum(axis=0)

>>> df_new.loc[:,'Total'] = df_new.sum(axis=1)

>>> df_new

Out: 

               Straight A's     Not   Total

Graduate               60.0   440.0   500.0

Undergraduate         240.0  3760.0  4000.0

Total                 300.0  4200.0  4500.0

answered Nov 30 '18 at 2:52

TimeSeam

1815

add a comment |

The original data is:

>>> df = pd.DataFrame(dict(Undergraduate=[240, 3760], Graduate=[60, 440]), index=["Straight A's", "Not"])

>>> df

Out: 

              Graduate  Undergraduate

Straight A's        60            240

Not                440           3760

You can only use df.T to achieve recreating this table:

>>> df_new = df.T

>>> df_new

Out: 

               Straight A's   Not

Graduate                 60   440

Undergraduate           240  3760

After computing the Total by row and columns:

>>> df_new.loc['Total',:]= df_new.sum(axis=0)

>>> df_new.loc[:,'Total'] = df_new.sum(axis=1)

>>> df_new

Out: 

               Straight A's     Not   Total

Graduate               60.0   440.0   500.0

Undergraduate         240.0  3760.0  4000.0

Total                 300.0  4200.0  4500.0

answered Nov 30 '18 at 2:52

TimeSeam

1815

add a comment |

The original data is:

>>> df = pd.DataFrame(dict(Undergraduate=[240, 3760], Graduate=[60, 440]), index=["Straight A's", "Not"])

>>> df

Out: 

              Graduate  Undergraduate

Straight A's        60            240

Not                440           3760

You can only use df.T to achieve recreating this table:

>>> df_new = df.T

>>> df_new

Out: 

               Straight A's   Not

Graduate                 60   440

Undergraduate           240  3760

After computing the Total by row and columns:

>>> df_new.loc['Total',:]= df_new.sum(axis=0)

>>> df_new.loc[:,'Total'] = df_new.sum(axis=1)

>>> df_new

Out: 

               Straight A's     Not   Total

Graduate               60.0   440.0   500.0

Undergraduate         240.0  3760.0  4000.0

Total                 300.0  4200.0  4500.0

answered Nov 30 '18 at 2:52

TimeSeam

1815

The original data is:

>>> df = pd.DataFrame(dict(Undergraduate=[240, 3760], Graduate=[60, 440]), index=["Straight A's", "Not"])

>>> df

Out: 

              Graduate  Undergraduate

Straight A's        60            240

Not                440           3760

You can only use df.T to achieve recreating this table:

>>> df_new = df.T

>>> df_new

Out: 

               Straight A's   Not

Graduate                 60   440

Undergraduate           240  3760

After computing the Total by row and columns:

>>> df_new.loc['Total',:]= df_new.sum(axis=0)

>>> df_new.loc[:,'Total'] = df_new.sum(axis=1)

>>> df_new

Out: 

               Straight A's     Not   Total

Graduate               60.0   440.0   500.0

Undergraduate         240.0  3760.0  4000.0

Total                 300.0  4200.0  4500.0

answered Nov 30 '18 at 2:52

TimeSeam

1815

answered Nov 30 '18 at 2:52

TimeSeam

1815

answered Nov 30 '18 at 2:52

TimeSeam

1815

answered Nov 30 '18 at 2:52

TimeSeam

1815

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

Search This Blog

Ufyukyu