how to group the rows and sum the values in one column in python

I have a tab separated file like this example:

small example:

chr5    112312630   112312650   31  chr5    112312630   112321662   DCP2    ENST00000543319.1

chr5    137676883   137676900   123 chr5    137676883   137676949   FAM53C  ENST00000434981.2

chr5    137676900   137676949   42  chr5    137676883   137676949   FAM53C  ENST00000434981.2

chr5    139944400   139944450   92  chr5    139944064   139946344   SLC35A4 ENST00000323146.3

chr5    139945450   139945500   77  chr5    139944064   139946344   SLC35A4 ENST00000323146.3

I want to group the lines based on 5th, 6th and 7th columns and sum the values of 4th column in each group.
here is the expected output:

expected output:

chr5    112312630   112312650   31  chr5    112312630   112321662   DCP2    ENST00000543319.1

chr5    137676900   137676949   165 chr5    137676883   137676949   FAM53C  ENST00000434981.2

chr5    139944400   139944450   169 chr5    139944064   139946344   SLC35A4 ENST00000323146.3

I am trying to do that in python using the following command but it does not really work. do you know how to fix it?

import pandas as pd

df = pd.read_csv('myfile.txt', sep='t', header=None)

df = df.groupby(5, 6, 7, 8).sum()

edited Jan 2 at 12:45

meW

2,858120

asked Jan 2 at 12:07

user10657934

506

Possible duplicate of Pandas group-by and sum

– Daniel Mesejo
Jan 2 at 12:11

add a comment |

I have a tab separated file like this example:

small example:

chr5    112312630   112312650   31  chr5    112312630   112321662   DCP2    ENST00000543319.1

chr5    137676883   137676900   123 chr5    137676883   137676949   FAM53C  ENST00000434981.2

chr5    137676900   137676949   42  chr5    137676883   137676949   FAM53C  ENST00000434981.2

chr5    139944400   139944450   92  chr5    139944064   139946344   SLC35A4 ENST00000323146.3

chr5    139945450   139945500   77  chr5    139944064   139946344   SLC35A4 ENST00000323146.3

I want to group the lines based on 5th, 6th and 7th columns and sum the values of 4th column in each group.
here is the expected output:

expected output:

chr5    112312630   112312650   31  chr5    112312630   112321662   DCP2    ENST00000543319.1

chr5    137676900   137676949   165 chr5    137676883   137676949   FAM53C  ENST00000434981.2

chr5    139944400   139944450   169 chr5    139944064   139946344   SLC35A4 ENST00000323146.3

I am trying to do that in python using the following command but it does not really work. do you know how to fix it?

import pandas as pd

df = pd.read_csv('myfile.txt', sep='t', header=None)

df = df.groupby(5, 6, 7, 8).sum()

edited Jan 2 at 12:45

meW

2,858120

asked Jan 2 at 12:07

user10657934

506

Possible duplicate of Pandas group-by and sum

– Daniel Mesejo
Jan 2 at 12:11

add a comment |

I have a tab separated file like this example:

small example:

chr5    112312630   112312650   31  chr5    112312630   112321662   DCP2    ENST00000543319.1

chr5    137676883   137676900   123 chr5    137676883   137676949   FAM53C  ENST00000434981.2

chr5    137676900   137676949   42  chr5    137676883   137676949   FAM53C  ENST00000434981.2

chr5    139944400   139944450   92  chr5    139944064   139946344   SLC35A4 ENST00000323146.3

chr5    139945450   139945500   77  chr5    139944064   139946344   SLC35A4 ENST00000323146.3

I want to group the lines based on 5th, 6th and 7th columns and sum the values of 4th column in each group.
here is the expected output:

expected output:

chr5    112312630   112312650   31  chr5    112312630   112321662   DCP2    ENST00000543319.1

chr5    137676900   137676949   165 chr5    137676883   137676949   FAM53C  ENST00000434981.2

chr5    139944400   139944450   169 chr5    139944064   139946344   SLC35A4 ENST00000323146.3

I am trying to do that in python using the following command but it does not really work. do you know how to fix it?

import pandas as pd

df = pd.read_csv('myfile.txt', sep='t', header=None)

df = df.groupby(5, 6, 7, 8).sum()

edited Jan 2 at 12:45

meW

2,858120

asked Jan 2 at 12:07

user10657934

506

I have a tab separated file like this example:

small example:

chr5    112312630   112312650   31  chr5    112312630   112321662   DCP2    ENST00000543319.1

chr5    137676883   137676900   123 chr5    137676883   137676949   FAM53C  ENST00000434981.2

chr5    137676900   137676949   42  chr5    137676883   137676949   FAM53C  ENST00000434981.2

chr5    139944400   139944450   92  chr5    139944064   139946344   SLC35A4 ENST00000323146.3

chr5    139945450   139945500   77  chr5    139944064   139946344   SLC35A4 ENST00000323146.3

I want to group the lines based on 5th, 6th and 7th columns and sum the values of 4th column in each group.
here is the expected output:

expected output:

chr5    112312630   112312650   31  chr5    112312630   112321662   DCP2    ENST00000543319.1

chr5    137676900   137676949   165 chr5    137676883   137676949   FAM53C  ENST00000434981.2

chr5    139944400   139944450   169 chr5    139944064   139946344   SLC35A4 ENST00000323146.3

I am trying to do that in python using the following command but it does not really work. do you know how to fix it?

import pandas as pd

df = pd.read_csv('myfile.txt', sep='t', header=None)

df = df.groupby(5, 6, 7, 8).sum()

python pandas

edited Jan 2 at 12:45

meW

2,858120

asked Jan 2 at 12:07

user10657934

506

edited Jan 2 at 12:45

meW

2,858120

asked Jan 2 at 12:07

user10657934

506

edited Jan 2 at 12:45

meW

2,858120

edited Jan 2 at 12:45

meW

2,858120

edited Jan 2 at 12:45

meW

2,858120

asked Jan 2 at 12:07

user10657934

506

asked Jan 2 at 12:07

user10657934

506

asked Jan 2 at 12:07

user10657934

506

Possible duplicate of Pandas group-by and sum

– Daniel Mesejo
Jan 2 at 12:11

add a comment |

Possible duplicate of Pandas group-by and sum

– Daniel Mesejo
Jan 2 at 12:11

Possible duplicate of Pandas group-by and sum

– Daniel Mesejo
Jan 2 at 12:11

add a comment |

4 Answers
4

active

oldest

votes

You just have to group the cols:

df.groupby([5,6,7,8]).sum()

answered Jan 2 at 13:05

B. M.

13.4k12137

No, it aggregate all numeric columns by sum and OP need only 4. column, but also is necessary not lost all another columns - check my solution

– jezrael
Jan 2 at 13:11

I put OP on the way ... After of course if can be tuned.

– B. M.
Jan 2 at 13:35

add a comment |

You need aggregate by DataFrameGroupBy.agg with dictionary of columns with aggregated functions, here all column different by cols are aggregate by last or first, only 4 column is aggregated by sum:

cols = [5, 6, 7, 8]

d = dict.fromkeys(df.columns.difference(cols), 'last')

d[3] = 'sum'

print (d)

{0: 'last', 1: 'last', 2: 'last', 3: 'sum', 4: 'last'}



df = df.groupby([5, 6, 7, 8], as_index=False).agg(d).reindex(columns=df.columns)

print (df)

      0          1          2    3     4          5          6        7  

0  chr5  112312630  112312650   31  chr5  112312630  112321662     DCP2   

1  chr5  137676900  137676949  165  chr5  137676883  137676949   FAM53C   

2  chr5  139945450  139945500  169  chr5  139944064  139946344  SLC35A4   



                   8  

0  ENST00000543319.1  

1  ENST00000434981.2  

2  ENST00000323146.3

cols = [5, 6, 7, 8]

d = dict.fromkeys(df.columns.difference(cols), 'first')

d[3] = 'sum'

print (d)

{0: 'first', 1: 'first', 2: 'first', 3: 'sum', 4: 'first'}



df = df.groupby([5, 6, 7, 8], as_index=False).agg(d).reindex(columns=df.columns)

print (df)

      0          1          2    3     4          5          6        7  

0  chr5  112312630  112312650   31  chr5  112312630  112321662     DCP2   

1  chr5  137676883  137676900  165  chr5  137676883  137676949   FAM53C   

2  chr5  139944400  139944450  169  chr5  139944064  139946344  SLC35A4   



                   8  

0  ENST00000543319.1  

1  ENST00000434981.2  

2  ENST00000323146.3

edited Jan 2 at 13:13

answered Jan 2 at 13:07

jezrael

350k26313388

add a comment |

Try this:

df.groupby(['column'])[['another column']].sum()

It groups by column and add sum of another column.
I used so that you understand you can group by multiple columns, like this:

df.groupby(['column1', 'column2'])

answered Jan 2 at 12:10

Mehrdad Pedramfar

6,33411643

add a comment |

Input Dataframe: considering only first 3 rows,

data = {'col1': ['chr5', 'chr5', 'chr5'],

        'col2': [112312630,137676883,137676900],

        'col3': [112312650,137676900,137676949],

        'col4': [31, 123,42],

        'col5': ['chr5', 'chr5', 'chr5'],

        'col6': [112312630 ,137676883 ,137676883 ],

        'col7': [112321662, 137676949, 137676949],

        'col8': ['DCP2', 'FAM53C', 'FAM53C'],

       'col9': ['ENST00000543319.1', 'ENST00000434981.2', 'ENST00000434981.2']

       }



df = pd.DataFrame(data = data)

df

Do like this,

cols = ['col5', 'col6', 'col7', 'col8']

col_sum = df.groupby(cols)['col4'].sum()

col_sum

Output: this is a multi-level dataframe. Last column is your output,

col5  col6       col7       col8  

chr5  112312630  112321662  DCP2       31

      137676883  137676949  FAM53C    165

edited Jan 2 at 12:30

answered Jan 2 at 12:12

Abdur Rehman

626511

this returns : 2994450 2994675 RHNO1 ENST00000461997.2 chr12 7155218 7155278 CTDNEP1 ENST00000318988.6 chr17chr17 9286722 9286821 DENND5A ENST00000328194.3 chr11

– user10657934
Jan 2 at 12:15

@user10657934 provide your dataset file so I can resolve that issue.

– Abdur Rehman
Jan 2 at 12:17

the example in the question is exactly a part of my file

– user10657934
Jan 2 at 12:17

@user10657934 let me make dataframe from this and will give you a code example in a while.

– Abdur Rehman
Jan 2 at 12:19

@user10657934 char5 is a data column or index?

– Abdur Rehman
Jan 2 at 12:20

|
show 1 more comment

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54006086%2fhow-to-group-the-rows-and-sum-the-values-in-one-column-in-python%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

You just have to group the cols:

df.groupby([5,6,7,8]).sum()

answered Jan 2 at 13:05

B. M.

13.4k12137

No, it aggregate all numeric columns by sum and OP need only 4. column, but also is necessary not lost all another columns - check my solution

– jezrael
Jan 2 at 13:11

I put OP on the way ... After of course if can be tuned.

– B. M.
Jan 2 at 13:35

add a comment |

You just have to group the cols:

df.groupby([5,6,7,8]).sum()

answered Jan 2 at 13:05

B. M.

13.4k12137

No, it aggregate all numeric columns by sum and OP need only 4. column, but also is necessary not lost all another columns - check my solution

– jezrael
Jan 2 at 13:11

I put OP on the way ... After of course if can be tuned.

– B. M.
Jan 2 at 13:35

add a comment |

You just have to group the cols:

df.groupby([5,6,7,8]).sum()

answered Jan 2 at 13:05

B. M.

13.4k12137

You just have to group the cols:

df.groupby([5,6,7,8]).sum()

answered Jan 2 at 13:05

B. M.

13.4k12137

answered Jan 2 at 13:05

B. M.

13.4k12137

answered Jan 2 at 13:05

B. M.

13.4k12137

answered Jan 2 at 13:05

B. M.

13.4k12137

No, it aggregate all numeric columns by sum and OP need only 4. column, but also is necessary not lost all another columns - check my solution

– jezrael
Jan 2 at 13:11

I put OP on the way ... After of course if can be tuned.

– B. M.
Jan 2 at 13:35

add a comment |

No, it aggregate all numeric columns by sum and OP need only 4. column, but also is necessary not lost all another columns - check my solution

– jezrael
Jan 2 at 13:11

I put OP on the way ... After of course if can be tuned.

– B. M.
Jan 2 at 13:35

No, it aggregate all numeric columns by sum and OP need only 4. column, but also is necessary not lost all another columns - check my solution

– jezrael
Jan 2 at 13:11

I put OP on the way ... After of course if can be tuned.

– B. M.
Jan 2 at 13:35

add a comment |

cols = [5, 6, 7, 8]

d = dict.fromkeys(df.columns.difference(cols), 'last')

d[3] = 'sum'

print (d)

{0: 'last', 1: 'last', 2: 'last', 3: 'sum', 4: 'last'}



df = df.groupby([5, 6, 7, 8], as_index=False).agg(d).reindex(columns=df.columns)

print (df)

      0          1          2    3     4          5          6        7  

0  chr5  112312630  112312650   31  chr5  112312630  112321662     DCP2   

1  chr5  137676900  137676949  165  chr5  137676883  137676949   FAM53C   

2  chr5  139945450  139945500  169  chr5  139944064  139946344  SLC35A4   



                   8  

0  ENST00000543319.1  

1  ENST00000434981.2  

2  ENST00000323146.3

cols = [5, 6, 7, 8]

d = dict.fromkeys(df.columns.difference(cols), 'first')

d[3] = 'sum'

print (d)

{0: 'first', 1: 'first', 2: 'first', 3: 'sum', 4: 'first'}



df = df.groupby([5, 6, 7, 8], as_index=False).agg(d).reindex(columns=df.columns)

print (df)

      0          1          2    3     4          5          6        7  

0  chr5  112312630  112312650   31  chr5  112312630  112321662     DCP2   

1  chr5  137676883  137676900  165  chr5  137676883  137676949   FAM53C   

2  chr5  139944400  139944450  169  chr5  139944064  139946344  SLC35A4   



                   8  

0  ENST00000543319.1  

1  ENST00000434981.2  

2  ENST00000323146.3

edited Jan 2 at 13:13

answered Jan 2 at 13:07

jezrael

350k26313388

add a comment |

cols = [5, 6, 7, 8]

d = dict.fromkeys(df.columns.difference(cols), 'last')

d[3] = 'sum'

print (d)

{0: 'last', 1: 'last', 2: 'last', 3: 'sum', 4: 'last'}



df = df.groupby([5, 6, 7, 8], as_index=False).agg(d).reindex(columns=df.columns)

print (df)

      0          1          2    3     4          5          6        7  

0  chr5  112312630  112312650   31  chr5  112312630  112321662     DCP2   

1  chr5  137676900  137676949  165  chr5  137676883  137676949   FAM53C   

2  chr5  139945450  139945500  169  chr5  139944064  139946344  SLC35A4   



                   8  

0  ENST00000543319.1  

1  ENST00000434981.2  

2  ENST00000323146.3

cols = [5, 6, 7, 8]

d = dict.fromkeys(df.columns.difference(cols), 'first')

d[3] = 'sum'

print (d)

{0: 'first', 1: 'first', 2: 'first', 3: 'sum', 4: 'first'}



df = df.groupby([5, 6, 7, 8], as_index=False).agg(d).reindex(columns=df.columns)

print (df)

      0          1          2    3     4          5          6        7  

0  chr5  112312630  112312650   31  chr5  112312630  112321662     DCP2   

1  chr5  137676883  137676900  165  chr5  137676883  137676949   FAM53C   

2  chr5  139944400  139944450  169  chr5  139944064  139946344  SLC35A4   



                   8  

0  ENST00000543319.1  

1  ENST00000434981.2  

2  ENST00000323146.3

edited Jan 2 at 13:13

answered Jan 2 at 13:07

jezrael

350k26313388

add a comment |

cols = [5, 6, 7, 8]

d = dict.fromkeys(df.columns.difference(cols), 'last')

d[3] = 'sum'

print (d)

{0: 'last', 1: 'last', 2: 'last', 3: 'sum', 4: 'last'}



df = df.groupby([5, 6, 7, 8], as_index=False).agg(d).reindex(columns=df.columns)

print (df)

      0          1          2    3     4          5          6        7  

0  chr5  112312630  112312650   31  chr5  112312630  112321662     DCP2   

1  chr5  137676900  137676949  165  chr5  137676883  137676949   FAM53C   

2  chr5  139945450  139945500  169  chr5  139944064  139946344  SLC35A4   



                   8  

0  ENST00000543319.1  

1  ENST00000434981.2  

2  ENST00000323146.3

cols = [5, 6, 7, 8]

d = dict.fromkeys(df.columns.difference(cols), 'first')

d[3] = 'sum'

print (d)

{0: 'first', 1: 'first', 2: 'first', 3: 'sum', 4: 'first'}



df = df.groupby([5, 6, 7, 8], as_index=False).agg(d).reindex(columns=df.columns)

print (df)

      0          1          2    3     4          5          6        7  

0  chr5  112312630  112312650   31  chr5  112312630  112321662     DCP2   

1  chr5  137676883  137676900  165  chr5  137676883  137676949   FAM53C   

2  chr5  139944400  139944450  169  chr5  139944064  139946344  SLC35A4   



                   8  

0  ENST00000543319.1  

1  ENST00000434981.2  

2  ENST00000323146.3

edited Jan 2 at 13:13

answered Jan 2 at 13:07

jezrael

350k26313388

cols = [5, 6, 7, 8]

d = dict.fromkeys(df.columns.difference(cols), 'last')

d[3] = 'sum'

print (d)

{0: 'last', 1: 'last', 2: 'last', 3: 'sum', 4: 'last'}



df = df.groupby([5, 6, 7, 8], as_index=False).agg(d).reindex(columns=df.columns)

print (df)

      0          1          2    3     4          5          6        7  

0  chr5  112312630  112312650   31  chr5  112312630  112321662     DCP2   

1  chr5  137676900  137676949  165  chr5  137676883  137676949   FAM53C   

2  chr5  139945450  139945500  169  chr5  139944064  139946344  SLC35A4   



                   8  

0  ENST00000543319.1  

1  ENST00000434981.2  

2  ENST00000323146.3

cols = [5, 6, 7, 8]

d = dict.fromkeys(df.columns.difference(cols), 'first')

d[3] = 'sum'

print (d)

{0: 'first', 1: 'first', 2: 'first', 3: 'sum', 4: 'first'}



df = df.groupby([5, 6, 7, 8], as_index=False).agg(d).reindex(columns=df.columns)

print (df)

      0          1          2    3     4          5          6        7  

0  chr5  112312630  112312650   31  chr5  112312630  112321662     DCP2   

1  chr5  137676883  137676900  165  chr5  137676883  137676949   FAM53C   

2  chr5  139944400  139944450  169  chr5  139944064  139946344  SLC35A4   



                   8  

0  ENST00000543319.1  

1  ENST00000434981.2  

2  ENST00000323146.3

edited Jan 2 at 13:13

answered Jan 2 at 13:07

jezrael

350k26313388

edited Jan 2 at 13:13

answered Jan 2 at 13:07

jezrael

350k26313388

answered Jan 2 at 13:07

jezrael

350k26313388

answered Jan 2 at 13:07

jezrael

350k26313388

add a comment |

Try this:

df.groupby(['column'])[['another column']].sum()

It groups by column and add sum of another column.
I used so that you understand you can group by multiple columns, like this:

df.groupby(['column1', 'column2'])

answered Jan 2 at 12:10

Mehrdad Pedramfar

6,33411643

add a comment |

Try this:

df.groupby(['column'])[['another column']].sum()

It groups by column and add sum of another column.
I used so that you understand you can group by multiple columns, like this:

df.groupby(['column1', 'column2'])

answered Jan 2 at 12:10

Mehrdad Pedramfar

6,33411643

add a comment |

Try this:

df.groupby(['column'])[['another column']].sum()

It groups by column and add sum of another column.
I used so that you understand you can group by multiple columns, like this:

df.groupby(['column1', 'column2'])

answered Jan 2 at 12:10

Mehrdad Pedramfar

6,33411643

Try this:

df.groupby(['column'])[['another column']].sum()

It groups by column and add sum of another column.
I used so that you understand you can group by multiple columns, like this:

df.groupby(['column1', 'column2'])

answered Jan 2 at 12:10

Mehrdad Pedramfar

6,33411643

answered Jan 2 at 12:10

Mehrdad Pedramfar

6,33411643

answered Jan 2 at 12:10

Mehrdad Pedramfar

6,33411643

answered Jan 2 at 12:10

Mehrdad Pedramfar

6,33411643

add a comment |

Input Dataframe: considering only first 3 rows,

data = {'col1': ['chr5', 'chr5', 'chr5'],

        'col2': [112312630,137676883,137676900],

        'col3': [112312650,137676900,137676949],

        'col4': [31, 123,42],

        'col5': ['chr5', 'chr5', 'chr5'],

        'col6': [112312630 ,137676883 ,137676883 ],

        'col7': [112321662, 137676949, 137676949],

        'col8': ['DCP2', 'FAM53C', 'FAM53C'],

       'col9': ['ENST00000543319.1', 'ENST00000434981.2', 'ENST00000434981.2']

       }



df = pd.DataFrame(data = data)

df

Do like this,

cols = ['col5', 'col6', 'col7', 'col8']

col_sum = df.groupby(cols)['col4'].sum()

col_sum

Output: this is a multi-level dataframe. Last column is your output,

col5  col6       col7       col8  

chr5  112312630  112321662  DCP2       31

      137676883  137676949  FAM53C    165

edited Jan 2 at 12:30

answered Jan 2 at 12:12

Abdur Rehman

626511

this returns : 2994450 2994675 RHNO1 ENST00000461997.2 chr12 7155218 7155278 CTDNEP1 ENST00000318988.6 chr17chr17 9286722 9286821 DENND5A ENST00000328194.3 chr11

– user10657934
Jan 2 at 12:15

@user10657934 provide your dataset file so I can resolve that issue.

– Abdur Rehman
Jan 2 at 12:17

the example in the question is exactly a part of my file

– user10657934
Jan 2 at 12:17

@user10657934 let me make dataframe from this and will give you a code example in a while.

– Abdur Rehman
Jan 2 at 12:19

@user10657934 char5 is a data column or index?

– Abdur Rehman
Jan 2 at 12:20

|
show 1 more comment

Input Dataframe: considering only first 3 rows,

data = {'col1': ['chr5', 'chr5', 'chr5'],

        'col2': [112312630,137676883,137676900],

        'col3': [112312650,137676900,137676949],

        'col4': [31, 123,42],

        'col5': ['chr5', 'chr5', 'chr5'],

        'col6': [112312630 ,137676883 ,137676883 ],

        'col7': [112321662, 137676949, 137676949],

        'col8': ['DCP2', 'FAM53C', 'FAM53C'],

       'col9': ['ENST00000543319.1', 'ENST00000434981.2', 'ENST00000434981.2']

       }



df = pd.DataFrame(data = data)

df

Do like this,

cols = ['col5', 'col6', 'col7', 'col8']

col_sum = df.groupby(cols)['col4'].sum()

col_sum

Output: this is a multi-level dataframe. Last column is your output,

col5  col6       col7       col8  

chr5  112312630  112321662  DCP2       31

      137676883  137676949  FAM53C    165

edited Jan 2 at 12:30

answered Jan 2 at 12:12

Abdur Rehman

626511

this returns : 2994450 2994675 RHNO1 ENST00000461997.2 chr12 7155218 7155278 CTDNEP1 ENST00000318988.6 chr17chr17 9286722 9286821 DENND5A ENST00000328194.3 chr11

– user10657934
Jan 2 at 12:15

@user10657934 provide your dataset file so I can resolve that issue.

– Abdur Rehman
Jan 2 at 12:17

the example in the question is exactly a part of my file

– user10657934
Jan 2 at 12:17

@user10657934 let me make dataframe from this and will give you a code example in a while.

– Abdur Rehman
Jan 2 at 12:19

@user10657934 char5 is a data column or index?

– Abdur Rehman
Jan 2 at 12:20

|
show 1 more comment

Input Dataframe: considering only first 3 rows,

data = {'col1': ['chr5', 'chr5', 'chr5'],

        'col2': [112312630,137676883,137676900],

        'col3': [112312650,137676900,137676949],

        'col4': [31, 123,42],

        'col5': ['chr5', 'chr5', 'chr5'],

        'col6': [112312630 ,137676883 ,137676883 ],

        'col7': [112321662, 137676949, 137676949],

        'col8': ['DCP2', 'FAM53C', 'FAM53C'],

       'col9': ['ENST00000543319.1', 'ENST00000434981.2', 'ENST00000434981.2']

       }



df = pd.DataFrame(data = data)

df

Do like this,

cols = ['col5', 'col6', 'col7', 'col8']

col_sum = df.groupby(cols)['col4'].sum()

col_sum

Output: this is a multi-level dataframe. Last column is your output,

col5  col6       col7       col8  

chr5  112312630  112321662  DCP2       31

      137676883  137676949  FAM53C    165

edited Jan 2 at 12:30

answered Jan 2 at 12:12

Abdur Rehman

626511

Input Dataframe: considering only first 3 rows,

data = {'col1': ['chr5', 'chr5', 'chr5'],

        'col2': [112312630,137676883,137676900],

        'col3': [112312650,137676900,137676949],

        'col4': [31, 123,42],

        'col5': ['chr5', 'chr5', 'chr5'],

        'col6': [112312630 ,137676883 ,137676883 ],

        'col7': [112321662, 137676949, 137676949],

        'col8': ['DCP2', 'FAM53C', 'FAM53C'],

       'col9': ['ENST00000543319.1', 'ENST00000434981.2', 'ENST00000434981.2']

       }



df = pd.DataFrame(data = data)

df

Do like this,

cols = ['col5', 'col6', 'col7', 'col8']

col_sum = df.groupby(cols)['col4'].sum()

col_sum

Output: this is a multi-level dataframe. Last column is your output,

col5  col6       col7       col8  

chr5  112312630  112321662  DCP2       31

      137676883  137676949  FAM53C    165

edited Jan 2 at 12:30

answered Jan 2 at 12:12

Abdur Rehman

626511

edited Jan 2 at 12:30

answered Jan 2 at 12:12

Abdur Rehman

626511

answered Jan 2 at 12:12

Abdur Rehman

626511

answered Jan 2 at 12:12

Abdur Rehman

626511

this returns : 2994450 2994675 RHNO1 ENST00000461997.2 chr12 7155218 7155278 CTDNEP1 ENST00000318988.6 chr17chr17 9286722 9286821 DENND5A ENST00000328194.3 chr11

– user10657934
Jan 2 at 12:15

@user10657934 provide your dataset file so I can resolve that issue.

– Abdur Rehman
Jan 2 at 12:17

the example in the question is exactly a part of my file

– user10657934
Jan 2 at 12:17

@user10657934 let me make dataframe from this and will give you a code example in a while.

– Abdur Rehman
Jan 2 at 12:19

@user10657934 char5 is a data column or index?

– Abdur Rehman
Jan 2 at 12:20

|
show 1 more comment

this returns : 2994450 2994675 RHNO1 ENST00000461997.2 chr12 7155218 7155278 CTDNEP1 ENST00000318988.6 chr17chr17 9286722 9286821 DENND5A ENST00000328194.3 chr11

– user10657934
Jan 2 at 12:15

@user10657934 provide your dataset file so I can resolve that issue.

– Abdur Rehman
Jan 2 at 12:17

the example in the question is exactly a part of my file

– user10657934
Jan 2 at 12:17

@user10657934 let me make dataframe from this and will give you a code example in a while.

– Abdur Rehman
Jan 2 at 12:19

@user10657934 char5 is a data column or index?

– Abdur Rehman
Jan 2 at 12:20

this returns : 2994450 2994675 RHNO1 ENST00000461997.2 chr12 7155218 7155278 CTDNEP1 ENST00000318988.6 chr17chr17 9286722 9286821 DENND5A ENST00000328194.3 chr11

– user10657934
Jan 2 at 12:15

@user10657934 provide your dataset file so I can resolve that issue.

– Abdur Rehman
Jan 2 at 12:17

the example in the question is exactly a part of my file

– user10657934
Jan 2 at 12:17

@user10657934 let me make dataframe from this and will give you a code example in a while.

– Abdur Rehman
Jan 2 at 12:19

@user10657934 char5 is a data column or index?

– Abdur Rehman
Jan 2 at 12:20

|
show 1 more comment

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

Search This Blog

Ufyukyu