how to group the rows and sum the values in one column in python
I have a tab separated file like this example:
small example:
chr5 112312630 112312650 31 chr5 112312630 112321662 DCP2 ENST00000543319.1
chr5 137676883 137676900 123 chr5 137676883 137676949 FAM53C ENST00000434981.2
chr5 137676900 137676949 42 chr5 137676883 137676949 FAM53C ENST00000434981.2
chr5 139944400 139944450 92 chr5 139944064 139946344 SLC35A4 ENST00000323146.3
chr5 139945450 139945500 77 chr5 139944064 139946344 SLC35A4 ENST00000323146.3
I want to group the lines based on 5th
, 6th
and 7th
columns and sum the values of 4th
column in each group.
here is the expected output:
expected output:
chr5 112312630 112312650 31 chr5 112312630 112321662 DCP2 ENST00000543319.1
chr5 137676900 137676949 165 chr5 137676883 137676949 FAM53C ENST00000434981.2
chr5 139944400 139944450 169 chr5 139944064 139946344 SLC35A4 ENST00000323146.3
I am trying to do that in python using the following command but it does not really work. do you know how to fix it?
import pandas as pd
df = pd.read_csv('myfile.txt', sep='t', header=None)
df = df.groupby(5, 6, 7, 8).sum()
python pandas
add a comment |
I have a tab separated file like this example:
small example:
chr5 112312630 112312650 31 chr5 112312630 112321662 DCP2 ENST00000543319.1
chr5 137676883 137676900 123 chr5 137676883 137676949 FAM53C ENST00000434981.2
chr5 137676900 137676949 42 chr5 137676883 137676949 FAM53C ENST00000434981.2
chr5 139944400 139944450 92 chr5 139944064 139946344 SLC35A4 ENST00000323146.3
chr5 139945450 139945500 77 chr5 139944064 139946344 SLC35A4 ENST00000323146.3
I want to group the lines based on 5th
, 6th
and 7th
columns and sum the values of 4th
column in each group.
here is the expected output:
expected output:
chr5 112312630 112312650 31 chr5 112312630 112321662 DCP2 ENST00000543319.1
chr5 137676900 137676949 165 chr5 137676883 137676949 FAM53C ENST00000434981.2
chr5 139944400 139944450 169 chr5 139944064 139946344 SLC35A4 ENST00000323146.3
I am trying to do that in python using the following command but it does not really work. do you know how to fix it?
import pandas as pd
df = pd.read_csv('myfile.txt', sep='t', header=None)
df = df.groupby(5, 6, 7, 8).sum()
python pandas
Possible duplicate of Pandas group-by and sum
– Daniel Mesejo
Jan 2 at 12:11
add a comment |
I have a tab separated file like this example:
small example:
chr5 112312630 112312650 31 chr5 112312630 112321662 DCP2 ENST00000543319.1
chr5 137676883 137676900 123 chr5 137676883 137676949 FAM53C ENST00000434981.2
chr5 137676900 137676949 42 chr5 137676883 137676949 FAM53C ENST00000434981.2
chr5 139944400 139944450 92 chr5 139944064 139946344 SLC35A4 ENST00000323146.3
chr5 139945450 139945500 77 chr5 139944064 139946344 SLC35A4 ENST00000323146.3
I want to group the lines based on 5th
, 6th
and 7th
columns and sum the values of 4th
column in each group.
here is the expected output:
expected output:
chr5 112312630 112312650 31 chr5 112312630 112321662 DCP2 ENST00000543319.1
chr5 137676900 137676949 165 chr5 137676883 137676949 FAM53C ENST00000434981.2
chr5 139944400 139944450 169 chr5 139944064 139946344 SLC35A4 ENST00000323146.3
I am trying to do that in python using the following command but it does not really work. do you know how to fix it?
import pandas as pd
df = pd.read_csv('myfile.txt', sep='t', header=None)
df = df.groupby(5, 6, 7, 8).sum()
python pandas
I have a tab separated file like this example:
small example:
chr5 112312630 112312650 31 chr5 112312630 112321662 DCP2 ENST00000543319.1
chr5 137676883 137676900 123 chr5 137676883 137676949 FAM53C ENST00000434981.2
chr5 137676900 137676949 42 chr5 137676883 137676949 FAM53C ENST00000434981.2
chr5 139944400 139944450 92 chr5 139944064 139946344 SLC35A4 ENST00000323146.3
chr5 139945450 139945500 77 chr5 139944064 139946344 SLC35A4 ENST00000323146.3
I want to group the lines based on 5th
, 6th
and 7th
columns and sum the values of 4th
column in each group.
here is the expected output:
expected output:
chr5 112312630 112312650 31 chr5 112312630 112321662 DCP2 ENST00000543319.1
chr5 137676900 137676949 165 chr5 137676883 137676949 FAM53C ENST00000434981.2
chr5 139944400 139944450 169 chr5 139944064 139946344 SLC35A4 ENST00000323146.3
I am trying to do that in python using the following command but it does not really work. do you know how to fix it?
import pandas as pd
df = pd.read_csv('myfile.txt', sep='t', header=None)
df = df.groupby(5, 6, 7, 8).sum()
python pandas
python pandas
edited Jan 2 at 12:45


meW
2,858120
2,858120
asked Jan 2 at 12:07
user10657934user10657934
506
506
Possible duplicate of Pandas group-by and sum
– Daniel Mesejo
Jan 2 at 12:11
add a comment |
Possible duplicate of Pandas group-by and sum
– Daniel Mesejo
Jan 2 at 12:11
Possible duplicate of Pandas group-by and sum
– Daniel Mesejo
Jan 2 at 12:11
Possible duplicate of Pandas group-by and sum
– Daniel Mesejo
Jan 2 at 12:11
add a comment |
4 Answers
4
active
oldest
votes
You just have to group the cols:
df.groupby([5,6,7,8]).sum()
No, it aggregate all numeric columns bysum
and OP need only4.
column, but also is necessary not lost all another columns - check my solution
– jezrael
Jan 2 at 13:11
I put OP on the way ... After of course if can be tuned.
– B. M.
Jan 2 at 13:35
add a comment |
You need aggregate by DataFrameGroupBy.agg
with dictionary of columns with aggregated functions, here all column different by cols
are aggregate by last
or first
, only 4
column is aggregated by sum
:
cols = [5, 6, 7, 8]
d = dict.fromkeys(df.columns.difference(cols), 'last')
d[3] = 'sum'
print (d)
{0: 'last', 1: 'last', 2: 'last', 3: 'sum', 4: 'last'}
df = df.groupby([5, 6, 7, 8], as_index=False).agg(d).reindex(columns=df.columns)
print (df)
0 1 2 3 4 5 6 7
0 chr5 112312630 112312650 31 chr5 112312630 112321662 DCP2
1 chr5 137676900 137676949 165 chr5 137676883 137676949 FAM53C
2 chr5 139945450 139945500 169 chr5 139944064 139946344 SLC35A4
8
0 ENST00000543319.1
1 ENST00000434981.2
2 ENST00000323146.3
cols = [5, 6, 7, 8]
d = dict.fromkeys(df.columns.difference(cols), 'first')
d[3] = 'sum'
print (d)
{0: 'first', 1: 'first', 2: 'first', 3: 'sum', 4: 'first'}
df = df.groupby([5, 6, 7, 8], as_index=False).agg(d).reindex(columns=df.columns)
print (df)
0 1 2 3 4 5 6 7
0 chr5 112312630 112312650 31 chr5 112312630 112321662 DCP2
1 chr5 137676883 137676900 165 chr5 137676883 137676949 FAM53C
2 chr5 139944400 139944450 169 chr5 139944064 139946344 SLC35A4
8
0 ENST00000543319.1
1 ENST00000434981.2
2 ENST00000323146.3
add a comment |
Try this:
df.groupby(['column'])[['another column']].sum()
It groups by column
and add sum of another column
.
I used so that you understand you can group by multiple columns, like this:
df.groupby(['column1', 'column2'])
add a comment |
Input Dataframe: considering only first 3 rows,
data = {'col1': ['chr5', 'chr5', 'chr5'],
'col2': [112312630,137676883,137676900],
'col3': [112312650,137676900,137676949],
'col4': [31, 123,42],
'col5': ['chr5', 'chr5', 'chr5'],
'col6': [112312630 ,137676883 ,137676883 ],
'col7': [112321662, 137676949, 137676949],
'col8': ['DCP2', 'FAM53C', 'FAM53C'],
'col9': ['ENST00000543319.1', 'ENST00000434981.2', 'ENST00000434981.2']
}
df = pd.DataFrame(data = data)
df
Do like this,
cols = ['col5', 'col6', 'col7', 'col8']
col_sum = df.groupby(cols)['col4'].sum()
col_sum
Output: this is a multi-level dataframe. Last column is your output,
col5 col6 col7 col8
chr5 112312630 112321662 DCP2 31
137676883 137676949 FAM53C 165
this returns : 2994450 2994675 RHNO1 ENST00000461997.2 chr12 7155218 7155278 CTDNEP1 ENST00000318988.6 chr17chr17 9286722 9286821 DENND5A ENST00000328194.3 chr11
– user10657934
Jan 2 at 12:15
@user10657934 provide your dataset file so I can resolve that issue.
– Abdur Rehman
Jan 2 at 12:17
the example in the question is exactly a part of my file
– user10657934
Jan 2 at 12:17
@user10657934 let me make dataframe from this and will give you a code example in a while.
– Abdur Rehman
Jan 2 at 12:19
@user10657934char5
is a data column or index?
– Abdur Rehman
Jan 2 at 12:20
|
show 1 more comment
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54006086%2fhow-to-group-the-rows-and-sum-the-values-in-one-column-in-python%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
You just have to group the cols:
df.groupby([5,6,7,8]).sum()
No, it aggregate all numeric columns bysum
and OP need only4.
column, but also is necessary not lost all another columns - check my solution
– jezrael
Jan 2 at 13:11
I put OP on the way ... After of course if can be tuned.
– B. M.
Jan 2 at 13:35
add a comment |
You just have to group the cols:
df.groupby([5,6,7,8]).sum()
No, it aggregate all numeric columns bysum
and OP need only4.
column, but also is necessary not lost all another columns - check my solution
– jezrael
Jan 2 at 13:11
I put OP on the way ... After of course if can be tuned.
– B. M.
Jan 2 at 13:35
add a comment |
You just have to group the cols:
df.groupby([5,6,7,8]).sum()
You just have to group the cols:
df.groupby([5,6,7,8]).sum()
answered Jan 2 at 13:05


B. M.B. M.
13.4k12137
13.4k12137
No, it aggregate all numeric columns bysum
and OP need only4.
column, but also is necessary not lost all another columns - check my solution
– jezrael
Jan 2 at 13:11
I put OP on the way ... After of course if can be tuned.
– B. M.
Jan 2 at 13:35
add a comment |
No, it aggregate all numeric columns bysum
and OP need only4.
column, but also is necessary not lost all another columns - check my solution
– jezrael
Jan 2 at 13:11
I put OP on the way ... After of course if can be tuned.
– B. M.
Jan 2 at 13:35
No, it aggregate all numeric columns by
sum
and OP need only 4.
column, but also is necessary not lost all another columns - check my solution– jezrael
Jan 2 at 13:11
No, it aggregate all numeric columns by
sum
and OP need only 4.
column, but also is necessary not lost all another columns - check my solution– jezrael
Jan 2 at 13:11
I put OP on the way ... After of course if can be tuned.
– B. M.
Jan 2 at 13:35
I put OP on the way ... After of course if can be tuned.
– B. M.
Jan 2 at 13:35
add a comment |
You need aggregate by DataFrameGroupBy.agg
with dictionary of columns with aggregated functions, here all column different by cols
are aggregate by last
or first
, only 4
column is aggregated by sum
:
cols = [5, 6, 7, 8]
d = dict.fromkeys(df.columns.difference(cols), 'last')
d[3] = 'sum'
print (d)
{0: 'last', 1: 'last', 2: 'last', 3: 'sum', 4: 'last'}
df = df.groupby([5, 6, 7, 8], as_index=False).agg(d).reindex(columns=df.columns)
print (df)
0 1 2 3 4 5 6 7
0 chr5 112312630 112312650 31 chr5 112312630 112321662 DCP2
1 chr5 137676900 137676949 165 chr5 137676883 137676949 FAM53C
2 chr5 139945450 139945500 169 chr5 139944064 139946344 SLC35A4
8
0 ENST00000543319.1
1 ENST00000434981.2
2 ENST00000323146.3
cols = [5, 6, 7, 8]
d = dict.fromkeys(df.columns.difference(cols), 'first')
d[3] = 'sum'
print (d)
{0: 'first', 1: 'first', 2: 'first', 3: 'sum', 4: 'first'}
df = df.groupby([5, 6, 7, 8], as_index=False).agg(d).reindex(columns=df.columns)
print (df)
0 1 2 3 4 5 6 7
0 chr5 112312630 112312650 31 chr5 112312630 112321662 DCP2
1 chr5 137676883 137676900 165 chr5 137676883 137676949 FAM53C
2 chr5 139944400 139944450 169 chr5 139944064 139946344 SLC35A4
8
0 ENST00000543319.1
1 ENST00000434981.2
2 ENST00000323146.3
add a comment |
You need aggregate by DataFrameGroupBy.agg
with dictionary of columns with aggregated functions, here all column different by cols
are aggregate by last
or first
, only 4
column is aggregated by sum
:
cols = [5, 6, 7, 8]
d = dict.fromkeys(df.columns.difference(cols), 'last')
d[3] = 'sum'
print (d)
{0: 'last', 1: 'last', 2: 'last', 3: 'sum', 4: 'last'}
df = df.groupby([5, 6, 7, 8], as_index=False).agg(d).reindex(columns=df.columns)
print (df)
0 1 2 3 4 5 6 7
0 chr5 112312630 112312650 31 chr5 112312630 112321662 DCP2
1 chr5 137676900 137676949 165 chr5 137676883 137676949 FAM53C
2 chr5 139945450 139945500 169 chr5 139944064 139946344 SLC35A4
8
0 ENST00000543319.1
1 ENST00000434981.2
2 ENST00000323146.3
cols = [5, 6, 7, 8]
d = dict.fromkeys(df.columns.difference(cols), 'first')
d[3] = 'sum'
print (d)
{0: 'first', 1: 'first', 2: 'first', 3: 'sum', 4: 'first'}
df = df.groupby([5, 6, 7, 8], as_index=False).agg(d).reindex(columns=df.columns)
print (df)
0 1 2 3 4 5 6 7
0 chr5 112312630 112312650 31 chr5 112312630 112321662 DCP2
1 chr5 137676883 137676900 165 chr5 137676883 137676949 FAM53C
2 chr5 139944400 139944450 169 chr5 139944064 139946344 SLC35A4
8
0 ENST00000543319.1
1 ENST00000434981.2
2 ENST00000323146.3
add a comment |
You need aggregate by DataFrameGroupBy.agg
with dictionary of columns with aggregated functions, here all column different by cols
are aggregate by last
or first
, only 4
column is aggregated by sum
:
cols = [5, 6, 7, 8]
d = dict.fromkeys(df.columns.difference(cols), 'last')
d[3] = 'sum'
print (d)
{0: 'last', 1: 'last', 2: 'last', 3: 'sum', 4: 'last'}
df = df.groupby([5, 6, 7, 8], as_index=False).agg(d).reindex(columns=df.columns)
print (df)
0 1 2 3 4 5 6 7
0 chr5 112312630 112312650 31 chr5 112312630 112321662 DCP2
1 chr5 137676900 137676949 165 chr5 137676883 137676949 FAM53C
2 chr5 139945450 139945500 169 chr5 139944064 139946344 SLC35A4
8
0 ENST00000543319.1
1 ENST00000434981.2
2 ENST00000323146.3
cols = [5, 6, 7, 8]
d = dict.fromkeys(df.columns.difference(cols), 'first')
d[3] = 'sum'
print (d)
{0: 'first', 1: 'first', 2: 'first', 3: 'sum', 4: 'first'}
df = df.groupby([5, 6, 7, 8], as_index=False).agg(d).reindex(columns=df.columns)
print (df)
0 1 2 3 4 5 6 7
0 chr5 112312630 112312650 31 chr5 112312630 112321662 DCP2
1 chr5 137676883 137676900 165 chr5 137676883 137676949 FAM53C
2 chr5 139944400 139944450 169 chr5 139944064 139946344 SLC35A4
8
0 ENST00000543319.1
1 ENST00000434981.2
2 ENST00000323146.3
You need aggregate by DataFrameGroupBy.agg
with dictionary of columns with aggregated functions, here all column different by cols
are aggregate by last
or first
, only 4
column is aggregated by sum
:
cols = [5, 6, 7, 8]
d = dict.fromkeys(df.columns.difference(cols), 'last')
d[3] = 'sum'
print (d)
{0: 'last', 1: 'last', 2: 'last', 3: 'sum', 4: 'last'}
df = df.groupby([5, 6, 7, 8], as_index=False).agg(d).reindex(columns=df.columns)
print (df)
0 1 2 3 4 5 6 7
0 chr5 112312630 112312650 31 chr5 112312630 112321662 DCP2
1 chr5 137676900 137676949 165 chr5 137676883 137676949 FAM53C
2 chr5 139945450 139945500 169 chr5 139944064 139946344 SLC35A4
8
0 ENST00000543319.1
1 ENST00000434981.2
2 ENST00000323146.3
cols = [5, 6, 7, 8]
d = dict.fromkeys(df.columns.difference(cols), 'first')
d[3] = 'sum'
print (d)
{0: 'first', 1: 'first', 2: 'first', 3: 'sum', 4: 'first'}
df = df.groupby([5, 6, 7, 8], as_index=False).agg(d).reindex(columns=df.columns)
print (df)
0 1 2 3 4 5 6 7
0 chr5 112312630 112312650 31 chr5 112312630 112321662 DCP2
1 chr5 137676883 137676900 165 chr5 137676883 137676949 FAM53C
2 chr5 139944400 139944450 169 chr5 139944064 139946344 SLC35A4
8
0 ENST00000543319.1
1 ENST00000434981.2
2 ENST00000323146.3
edited Jan 2 at 13:13
answered Jan 2 at 13:07


jezraeljezrael
350k26313388
350k26313388
add a comment |
add a comment |
Try this:
df.groupby(['column'])[['another column']].sum()
It groups by column
and add sum of another column
.
I used so that you understand you can group by multiple columns, like this:
df.groupby(['column1', 'column2'])
add a comment |
Try this:
df.groupby(['column'])[['another column']].sum()
It groups by column
and add sum of another column
.
I used so that you understand you can group by multiple columns, like this:
df.groupby(['column1', 'column2'])
add a comment |
Try this:
df.groupby(['column'])[['another column']].sum()
It groups by column
and add sum of another column
.
I used so that you understand you can group by multiple columns, like this:
df.groupby(['column1', 'column2'])
Try this:
df.groupby(['column'])[['another column']].sum()
It groups by column
and add sum of another column
.
I used so that you understand you can group by multiple columns, like this:
df.groupby(['column1', 'column2'])
answered Jan 2 at 12:10
Mehrdad PedramfarMehrdad Pedramfar
6,33411643
6,33411643
add a comment |
add a comment |
Input Dataframe: considering only first 3 rows,
data = {'col1': ['chr5', 'chr5', 'chr5'],
'col2': [112312630,137676883,137676900],
'col3': [112312650,137676900,137676949],
'col4': [31, 123,42],
'col5': ['chr5', 'chr5', 'chr5'],
'col6': [112312630 ,137676883 ,137676883 ],
'col7': [112321662, 137676949, 137676949],
'col8': ['DCP2', 'FAM53C', 'FAM53C'],
'col9': ['ENST00000543319.1', 'ENST00000434981.2', 'ENST00000434981.2']
}
df = pd.DataFrame(data = data)
df
Do like this,
cols = ['col5', 'col6', 'col7', 'col8']
col_sum = df.groupby(cols)['col4'].sum()
col_sum
Output: this is a multi-level dataframe. Last column is your output,
col5 col6 col7 col8
chr5 112312630 112321662 DCP2 31
137676883 137676949 FAM53C 165
this returns : 2994450 2994675 RHNO1 ENST00000461997.2 chr12 7155218 7155278 CTDNEP1 ENST00000318988.6 chr17chr17 9286722 9286821 DENND5A ENST00000328194.3 chr11
– user10657934
Jan 2 at 12:15
@user10657934 provide your dataset file so I can resolve that issue.
– Abdur Rehman
Jan 2 at 12:17
the example in the question is exactly a part of my file
– user10657934
Jan 2 at 12:17
@user10657934 let me make dataframe from this and will give you a code example in a while.
– Abdur Rehman
Jan 2 at 12:19
@user10657934char5
is a data column or index?
– Abdur Rehman
Jan 2 at 12:20
|
show 1 more comment
Input Dataframe: considering only first 3 rows,
data = {'col1': ['chr5', 'chr5', 'chr5'],
'col2': [112312630,137676883,137676900],
'col3': [112312650,137676900,137676949],
'col4': [31, 123,42],
'col5': ['chr5', 'chr5', 'chr5'],
'col6': [112312630 ,137676883 ,137676883 ],
'col7': [112321662, 137676949, 137676949],
'col8': ['DCP2', 'FAM53C', 'FAM53C'],
'col9': ['ENST00000543319.1', 'ENST00000434981.2', 'ENST00000434981.2']
}
df = pd.DataFrame(data = data)
df
Do like this,
cols = ['col5', 'col6', 'col7', 'col8']
col_sum = df.groupby(cols)['col4'].sum()
col_sum
Output: this is a multi-level dataframe. Last column is your output,
col5 col6 col7 col8
chr5 112312630 112321662 DCP2 31
137676883 137676949 FAM53C 165
this returns : 2994450 2994675 RHNO1 ENST00000461997.2 chr12 7155218 7155278 CTDNEP1 ENST00000318988.6 chr17chr17 9286722 9286821 DENND5A ENST00000328194.3 chr11
– user10657934
Jan 2 at 12:15
@user10657934 provide your dataset file so I can resolve that issue.
– Abdur Rehman
Jan 2 at 12:17
the example in the question is exactly a part of my file
– user10657934
Jan 2 at 12:17
@user10657934 let me make dataframe from this and will give you a code example in a while.
– Abdur Rehman
Jan 2 at 12:19
@user10657934char5
is a data column or index?
– Abdur Rehman
Jan 2 at 12:20
|
show 1 more comment
Input Dataframe: considering only first 3 rows,
data = {'col1': ['chr5', 'chr5', 'chr5'],
'col2': [112312630,137676883,137676900],
'col3': [112312650,137676900,137676949],
'col4': [31, 123,42],
'col5': ['chr5', 'chr5', 'chr5'],
'col6': [112312630 ,137676883 ,137676883 ],
'col7': [112321662, 137676949, 137676949],
'col8': ['DCP2', 'FAM53C', 'FAM53C'],
'col9': ['ENST00000543319.1', 'ENST00000434981.2', 'ENST00000434981.2']
}
df = pd.DataFrame(data = data)
df
Do like this,
cols = ['col5', 'col6', 'col7', 'col8']
col_sum = df.groupby(cols)['col4'].sum()
col_sum
Output: this is a multi-level dataframe. Last column is your output,
col5 col6 col7 col8
chr5 112312630 112321662 DCP2 31
137676883 137676949 FAM53C 165
Input Dataframe: considering only first 3 rows,
data = {'col1': ['chr5', 'chr5', 'chr5'],
'col2': [112312630,137676883,137676900],
'col3': [112312650,137676900,137676949],
'col4': [31, 123,42],
'col5': ['chr5', 'chr5', 'chr5'],
'col6': [112312630 ,137676883 ,137676883 ],
'col7': [112321662, 137676949, 137676949],
'col8': ['DCP2', 'FAM53C', 'FAM53C'],
'col9': ['ENST00000543319.1', 'ENST00000434981.2', 'ENST00000434981.2']
}
df = pd.DataFrame(data = data)
df
Do like this,
cols = ['col5', 'col6', 'col7', 'col8']
col_sum = df.groupby(cols)['col4'].sum()
col_sum
Output: this is a multi-level dataframe. Last column is your output,
col5 col6 col7 col8
chr5 112312630 112321662 DCP2 31
137676883 137676949 FAM53C 165
edited Jan 2 at 12:30
answered Jan 2 at 12:12


Abdur RehmanAbdur Rehman
626511
626511
this returns : 2994450 2994675 RHNO1 ENST00000461997.2 chr12 7155218 7155278 CTDNEP1 ENST00000318988.6 chr17chr17 9286722 9286821 DENND5A ENST00000328194.3 chr11
– user10657934
Jan 2 at 12:15
@user10657934 provide your dataset file so I can resolve that issue.
– Abdur Rehman
Jan 2 at 12:17
the example in the question is exactly a part of my file
– user10657934
Jan 2 at 12:17
@user10657934 let me make dataframe from this and will give you a code example in a while.
– Abdur Rehman
Jan 2 at 12:19
@user10657934char5
is a data column or index?
– Abdur Rehman
Jan 2 at 12:20
|
show 1 more comment
this returns : 2994450 2994675 RHNO1 ENST00000461997.2 chr12 7155218 7155278 CTDNEP1 ENST00000318988.6 chr17chr17 9286722 9286821 DENND5A ENST00000328194.3 chr11
– user10657934
Jan 2 at 12:15
@user10657934 provide your dataset file so I can resolve that issue.
– Abdur Rehman
Jan 2 at 12:17
the example in the question is exactly a part of my file
– user10657934
Jan 2 at 12:17
@user10657934 let me make dataframe from this and will give you a code example in a while.
– Abdur Rehman
Jan 2 at 12:19
@user10657934char5
is a data column or index?
– Abdur Rehman
Jan 2 at 12:20
this returns : 2994450 2994675 RHNO1 ENST00000461997.2 chr12 7155218 7155278 CTDNEP1 ENST00000318988.6 chr17chr17 9286722 9286821 DENND5A ENST00000328194.3 chr11
– user10657934
Jan 2 at 12:15
this returns : 2994450 2994675 RHNO1 ENST00000461997.2 chr12 7155218 7155278 CTDNEP1 ENST00000318988.6 chr17chr17 9286722 9286821 DENND5A ENST00000328194.3 chr11
– user10657934
Jan 2 at 12:15
@user10657934 provide your dataset file so I can resolve that issue.
– Abdur Rehman
Jan 2 at 12:17
@user10657934 provide your dataset file so I can resolve that issue.
– Abdur Rehman
Jan 2 at 12:17
the example in the question is exactly a part of my file
– user10657934
Jan 2 at 12:17
the example in the question is exactly a part of my file
– user10657934
Jan 2 at 12:17
@user10657934 let me make dataframe from this and will give you a code example in a while.
– Abdur Rehman
Jan 2 at 12:19
@user10657934 let me make dataframe from this and will give you a code example in a while.
– Abdur Rehman
Jan 2 at 12:19
@user10657934
char5
is a data column or index?– Abdur Rehman
Jan 2 at 12:20
@user10657934
char5
is a data column or index?– Abdur Rehman
Jan 2 at 12:20
|
show 1 more comment
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54006086%2fhow-to-group-the-rows-and-sum-the-values-in-one-column-in-python%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Possible duplicate of Pandas group-by and sum
– Daniel Mesejo
Jan 2 at 12:11