how to group the rows and sum the values in one column in python












0















I have a tab separated file like this example:



small example:



chr5    112312630   112312650   31  chr5    112312630   112321662   DCP2    ENST00000543319.1
chr5 137676883 137676900 123 chr5 137676883 137676949 FAM53C ENST00000434981.2
chr5 137676900 137676949 42 chr5 137676883 137676949 FAM53C ENST00000434981.2
chr5 139944400 139944450 92 chr5 139944064 139946344 SLC35A4 ENST00000323146.3
chr5 139945450 139945500 77 chr5 139944064 139946344 SLC35A4 ENST00000323146.3


I want to group the lines based on 5th, 6th and 7th columns and sum the values of 4th column in each group.
here is the expected output:



expected output:



chr5    112312630   112312650   31  chr5    112312630   112321662   DCP2    ENST00000543319.1
chr5 137676900 137676949 165 chr5 137676883 137676949 FAM53C ENST00000434981.2
chr5 139944400 139944450 169 chr5 139944064 139946344 SLC35A4 ENST00000323146.3


I am trying to do that in python using the following command but it does not really work. do you know how to fix it?



import pandas as pd
df = pd.read_csv('myfile.txt', sep='t', header=None)
df = df.groupby(5, 6, 7, 8).sum()









share|improve this question

























  • Possible duplicate of Pandas group-by and sum

    – Daniel Mesejo
    Jan 2 at 12:11
















0















I have a tab separated file like this example:



small example:



chr5    112312630   112312650   31  chr5    112312630   112321662   DCP2    ENST00000543319.1
chr5 137676883 137676900 123 chr5 137676883 137676949 FAM53C ENST00000434981.2
chr5 137676900 137676949 42 chr5 137676883 137676949 FAM53C ENST00000434981.2
chr5 139944400 139944450 92 chr5 139944064 139946344 SLC35A4 ENST00000323146.3
chr5 139945450 139945500 77 chr5 139944064 139946344 SLC35A4 ENST00000323146.3


I want to group the lines based on 5th, 6th and 7th columns and sum the values of 4th column in each group.
here is the expected output:



expected output:



chr5    112312630   112312650   31  chr5    112312630   112321662   DCP2    ENST00000543319.1
chr5 137676900 137676949 165 chr5 137676883 137676949 FAM53C ENST00000434981.2
chr5 139944400 139944450 169 chr5 139944064 139946344 SLC35A4 ENST00000323146.3


I am trying to do that in python using the following command but it does not really work. do you know how to fix it?



import pandas as pd
df = pd.read_csv('myfile.txt', sep='t', header=None)
df = df.groupby(5, 6, 7, 8).sum()









share|improve this question

























  • Possible duplicate of Pandas group-by and sum

    – Daniel Mesejo
    Jan 2 at 12:11














0












0








0








I have a tab separated file like this example:



small example:



chr5    112312630   112312650   31  chr5    112312630   112321662   DCP2    ENST00000543319.1
chr5 137676883 137676900 123 chr5 137676883 137676949 FAM53C ENST00000434981.2
chr5 137676900 137676949 42 chr5 137676883 137676949 FAM53C ENST00000434981.2
chr5 139944400 139944450 92 chr5 139944064 139946344 SLC35A4 ENST00000323146.3
chr5 139945450 139945500 77 chr5 139944064 139946344 SLC35A4 ENST00000323146.3


I want to group the lines based on 5th, 6th and 7th columns and sum the values of 4th column in each group.
here is the expected output:



expected output:



chr5    112312630   112312650   31  chr5    112312630   112321662   DCP2    ENST00000543319.1
chr5 137676900 137676949 165 chr5 137676883 137676949 FAM53C ENST00000434981.2
chr5 139944400 139944450 169 chr5 139944064 139946344 SLC35A4 ENST00000323146.3


I am trying to do that in python using the following command but it does not really work. do you know how to fix it?



import pandas as pd
df = pd.read_csv('myfile.txt', sep='t', header=None)
df = df.groupby(5, 6, 7, 8).sum()









share|improve this question
















I have a tab separated file like this example:



small example:



chr5    112312630   112312650   31  chr5    112312630   112321662   DCP2    ENST00000543319.1
chr5 137676883 137676900 123 chr5 137676883 137676949 FAM53C ENST00000434981.2
chr5 137676900 137676949 42 chr5 137676883 137676949 FAM53C ENST00000434981.2
chr5 139944400 139944450 92 chr5 139944064 139946344 SLC35A4 ENST00000323146.3
chr5 139945450 139945500 77 chr5 139944064 139946344 SLC35A4 ENST00000323146.3


I want to group the lines based on 5th, 6th and 7th columns and sum the values of 4th column in each group.
here is the expected output:



expected output:



chr5    112312630   112312650   31  chr5    112312630   112321662   DCP2    ENST00000543319.1
chr5 137676900 137676949 165 chr5 137676883 137676949 FAM53C ENST00000434981.2
chr5 139944400 139944450 169 chr5 139944064 139946344 SLC35A4 ENST00000323146.3


I am trying to do that in python using the following command but it does not really work. do you know how to fix it?



import pandas as pd
df = pd.read_csv('myfile.txt', sep='t', header=None)
df = df.groupby(5, 6, 7, 8).sum()






python pandas






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 2 at 12:45









meW

2,858120




2,858120










asked Jan 2 at 12:07









user10657934user10657934

506




506













  • Possible duplicate of Pandas group-by and sum

    – Daniel Mesejo
    Jan 2 at 12:11



















  • Possible duplicate of Pandas group-by and sum

    – Daniel Mesejo
    Jan 2 at 12:11

















Possible duplicate of Pandas group-by and sum

– Daniel Mesejo
Jan 2 at 12:11





Possible duplicate of Pandas group-by and sum

– Daniel Mesejo
Jan 2 at 12:11












4 Answers
4






active

oldest

votes


















1














You just have to group the cols:



df.groupby([5,6,7,8]).sum()





share|improve this answer
























  • No, it aggregate all numeric columns by sum and OP need only 4. column, but also is necessary not lost all another columns - check my solution

    – jezrael
    Jan 2 at 13:11













  • I put OP on the way ... After of course if can be tuned.

    – B. M.
    Jan 2 at 13:35



















1














You need aggregate by DataFrameGroupBy.agg with dictionary of columns with aggregated functions, here all column different by cols are aggregate by last or first, only 4 column is aggregated by sum:



cols = [5, 6, 7, 8]
d = dict.fromkeys(df.columns.difference(cols), 'last')
d[3] = 'sum'
print (d)
{0: 'last', 1: 'last', 2: 'last', 3: 'sum', 4: 'last'}

df = df.groupby([5, 6, 7, 8], as_index=False).agg(d).reindex(columns=df.columns)
print (df)
0 1 2 3 4 5 6 7
0 chr5 112312630 112312650 31 chr5 112312630 112321662 DCP2
1 chr5 137676900 137676949 165 chr5 137676883 137676949 FAM53C
2 chr5 139945450 139945500 169 chr5 139944064 139946344 SLC35A4

8
0 ENST00000543319.1
1 ENST00000434981.2
2 ENST00000323146.3




cols = [5, 6, 7, 8]
d = dict.fromkeys(df.columns.difference(cols), 'first')
d[3] = 'sum'
print (d)
{0: 'first', 1: 'first', 2: 'first', 3: 'sum', 4: 'first'}

df = df.groupby([5, 6, 7, 8], as_index=False).agg(d).reindex(columns=df.columns)
print (df)
0 1 2 3 4 5 6 7
0 chr5 112312630 112312650 31 chr5 112312630 112321662 DCP2
1 chr5 137676883 137676900 165 chr5 137676883 137676949 FAM53C
2 chr5 139944400 139944450 169 chr5 139944064 139946344 SLC35A4

8
0 ENST00000543319.1
1 ENST00000434981.2
2 ENST00000323146.3





share|improve this answer

































    0














    Try this:



    df.groupby(['column'])[['another column']].sum()


    It groups by column and add sum of another column.
    I used so that you understand you can group by multiple columns, like this:



    df.groupby(['column1', 'column2'])





    share|improve this answer































      0














      Input Dataframe: considering only first 3 rows,



      data = {'col1': ['chr5', 'chr5', 'chr5'],
      'col2': [112312630,137676883,137676900],
      'col3': [112312650,137676900,137676949],
      'col4': [31, 123,42],
      'col5': ['chr5', 'chr5', 'chr5'],
      'col6': [112312630 ,137676883 ,137676883 ],
      'col7': [112321662, 137676949, 137676949],
      'col8': ['DCP2', 'FAM53C', 'FAM53C'],
      'col9': ['ENST00000543319.1', 'ENST00000434981.2', 'ENST00000434981.2']
      }

      df = pd.DataFrame(data = data)
      df


      Do like this,



      cols = ['col5', 'col6', 'col7', 'col8']
      col_sum = df.groupby(cols)['col4'].sum()
      col_sum


      Output: this is a multi-level dataframe. Last column is your output,



      col5  col6       col7       col8  
      chr5 112312630 112321662 DCP2 31
      137676883 137676949 FAM53C 165





      share|improve this answer


























      • this returns : 2994450 2994675 RHNO1 ENST00000461997.2 chr12 7155218 7155278 CTDNEP1 ENST00000318988.6 chr17chr17 9286722 9286821 DENND5A ENST00000328194.3 chr11

        – user10657934
        Jan 2 at 12:15











      • @user10657934 provide your dataset file so I can resolve that issue.

        – Abdur Rehman
        Jan 2 at 12:17











      • the example in the question is exactly a part of my file

        – user10657934
        Jan 2 at 12:17











      • @user10657934 let me make dataframe from this and will give you a code example in a while.

        – Abdur Rehman
        Jan 2 at 12:19











      • @user10657934 char5 is a data column or index?

        – Abdur Rehman
        Jan 2 at 12:20











      Your Answer






      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "1"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54006086%2fhow-to-group-the-rows-and-sum-the-values-in-one-column-in-python%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      4 Answers
      4






      active

      oldest

      votes








      4 Answers
      4






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      1














      You just have to group the cols:



      df.groupby([5,6,7,8]).sum()





      share|improve this answer
























      • No, it aggregate all numeric columns by sum and OP need only 4. column, but also is necessary not lost all another columns - check my solution

        – jezrael
        Jan 2 at 13:11













      • I put OP on the way ... After of course if can be tuned.

        – B. M.
        Jan 2 at 13:35
















      1














      You just have to group the cols:



      df.groupby([5,6,7,8]).sum()





      share|improve this answer
























      • No, it aggregate all numeric columns by sum and OP need only 4. column, but also is necessary not lost all another columns - check my solution

        – jezrael
        Jan 2 at 13:11













      • I put OP on the way ... After of course if can be tuned.

        – B. M.
        Jan 2 at 13:35














      1












      1








      1







      You just have to group the cols:



      df.groupby([5,6,7,8]).sum()





      share|improve this answer













      You just have to group the cols:



      df.groupby([5,6,7,8]).sum()






      share|improve this answer












      share|improve this answer



      share|improve this answer










      answered Jan 2 at 13:05









      B. M.B. M.

      13.4k12137




      13.4k12137













      • No, it aggregate all numeric columns by sum and OP need only 4. column, but also is necessary not lost all another columns - check my solution

        – jezrael
        Jan 2 at 13:11













      • I put OP on the way ... After of course if can be tuned.

        – B. M.
        Jan 2 at 13:35



















      • No, it aggregate all numeric columns by sum and OP need only 4. column, but also is necessary not lost all another columns - check my solution

        – jezrael
        Jan 2 at 13:11













      • I put OP on the way ... After of course if can be tuned.

        – B. M.
        Jan 2 at 13:35

















      No, it aggregate all numeric columns by sum and OP need only 4. column, but also is necessary not lost all another columns - check my solution

      – jezrael
      Jan 2 at 13:11







      No, it aggregate all numeric columns by sum and OP need only 4. column, but also is necessary not lost all another columns - check my solution

      – jezrael
      Jan 2 at 13:11















      I put OP on the way ... After of course if can be tuned.

      – B. M.
      Jan 2 at 13:35





      I put OP on the way ... After of course if can be tuned.

      – B. M.
      Jan 2 at 13:35













      1














      You need aggregate by DataFrameGroupBy.agg with dictionary of columns with aggregated functions, here all column different by cols are aggregate by last or first, only 4 column is aggregated by sum:



      cols = [5, 6, 7, 8]
      d = dict.fromkeys(df.columns.difference(cols), 'last')
      d[3] = 'sum'
      print (d)
      {0: 'last', 1: 'last', 2: 'last', 3: 'sum', 4: 'last'}

      df = df.groupby([5, 6, 7, 8], as_index=False).agg(d).reindex(columns=df.columns)
      print (df)
      0 1 2 3 4 5 6 7
      0 chr5 112312630 112312650 31 chr5 112312630 112321662 DCP2
      1 chr5 137676900 137676949 165 chr5 137676883 137676949 FAM53C
      2 chr5 139945450 139945500 169 chr5 139944064 139946344 SLC35A4

      8
      0 ENST00000543319.1
      1 ENST00000434981.2
      2 ENST00000323146.3




      cols = [5, 6, 7, 8]
      d = dict.fromkeys(df.columns.difference(cols), 'first')
      d[3] = 'sum'
      print (d)
      {0: 'first', 1: 'first', 2: 'first', 3: 'sum', 4: 'first'}

      df = df.groupby([5, 6, 7, 8], as_index=False).agg(d).reindex(columns=df.columns)
      print (df)
      0 1 2 3 4 5 6 7
      0 chr5 112312630 112312650 31 chr5 112312630 112321662 DCP2
      1 chr5 137676883 137676900 165 chr5 137676883 137676949 FAM53C
      2 chr5 139944400 139944450 169 chr5 139944064 139946344 SLC35A4

      8
      0 ENST00000543319.1
      1 ENST00000434981.2
      2 ENST00000323146.3





      share|improve this answer






























        1














        You need aggregate by DataFrameGroupBy.agg with dictionary of columns with aggregated functions, here all column different by cols are aggregate by last or first, only 4 column is aggregated by sum:



        cols = [5, 6, 7, 8]
        d = dict.fromkeys(df.columns.difference(cols), 'last')
        d[3] = 'sum'
        print (d)
        {0: 'last', 1: 'last', 2: 'last', 3: 'sum', 4: 'last'}

        df = df.groupby([5, 6, 7, 8], as_index=False).agg(d).reindex(columns=df.columns)
        print (df)
        0 1 2 3 4 5 6 7
        0 chr5 112312630 112312650 31 chr5 112312630 112321662 DCP2
        1 chr5 137676900 137676949 165 chr5 137676883 137676949 FAM53C
        2 chr5 139945450 139945500 169 chr5 139944064 139946344 SLC35A4

        8
        0 ENST00000543319.1
        1 ENST00000434981.2
        2 ENST00000323146.3




        cols = [5, 6, 7, 8]
        d = dict.fromkeys(df.columns.difference(cols), 'first')
        d[3] = 'sum'
        print (d)
        {0: 'first', 1: 'first', 2: 'first', 3: 'sum', 4: 'first'}

        df = df.groupby([5, 6, 7, 8], as_index=False).agg(d).reindex(columns=df.columns)
        print (df)
        0 1 2 3 4 5 6 7
        0 chr5 112312630 112312650 31 chr5 112312630 112321662 DCP2
        1 chr5 137676883 137676900 165 chr5 137676883 137676949 FAM53C
        2 chr5 139944400 139944450 169 chr5 139944064 139946344 SLC35A4

        8
        0 ENST00000543319.1
        1 ENST00000434981.2
        2 ENST00000323146.3





        share|improve this answer




























          1












          1








          1







          You need aggregate by DataFrameGroupBy.agg with dictionary of columns with aggregated functions, here all column different by cols are aggregate by last or first, only 4 column is aggregated by sum:



          cols = [5, 6, 7, 8]
          d = dict.fromkeys(df.columns.difference(cols), 'last')
          d[3] = 'sum'
          print (d)
          {0: 'last', 1: 'last', 2: 'last', 3: 'sum', 4: 'last'}

          df = df.groupby([5, 6, 7, 8], as_index=False).agg(d).reindex(columns=df.columns)
          print (df)
          0 1 2 3 4 5 6 7
          0 chr5 112312630 112312650 31 chr5 112312630 112321662 DCP2
          1 chr5 137676900 137676949 165 chr5 137676883 137676949 FAM53C
          2 chr5 139945450 139945500 169 chr5 139944064 139946344 SLC35A4

          8
          0 ENST00000543319.1
          1 ENST00000434981.2
          2 ENST00000323146.3




          cols = [5, 6, 7, 8]
          d = dict.fromkeys(df.columns.difference(cols), 'first')
          d[3] = 'sum'
          print (d)
          {0: 'first', 1: 'first', 2: 'first', 3: 'sum', 4: 'first'}

          df = df.groupby([5, 6, 7, 8], as_index=False).agg(d).reindex(columns=df.columns)
          print (df)
          0 1 2 3 4 5 6 7
          0 chr5 112312630 112312650 31 chr5 112312630 112321662 DCP2
          1 chr5 137676883 137676900 165 chr5 137676883 137676949 FAM53C
          2 chr5 139944400 139944450 169 chr5 139944064 139946344 SLC35A4

          8
          0 ENST00000543319.1
          1 ENST00000434981.2
          2 ENST00000323146.3





          share|improve this answer















          You need aggregate by DataFrameGroupBy.agg with dictionary of columns with aggregated functions, here all column different by cols are aggregate by last or first, only 4 column is aggregated by sum:



          cols = [5, 6, 7, 8]
          d = dict.fromkeys(df.columns.difference(cols), 'last')
          d[3] = 'sum'
          print (d)
          {0: 'last', 1: 'last', 2: 'last', 3: 'sum', 4: 'last'}

          df = df.groupby([5, 6, 7, 8], as_index=False).agg(d).reindex(columns=df.columns)
          print (df)
          0 1 2 3 4 5 6 7
          0 chr5 112312630 112312650 31 chr5 112312630 112321662 DCP2
          1 chr5 137676900 137676949 165 chr5 137676883 137676949 FAM53C
          2 chr5 139945450 139945500 169 chr5 139944064 139946344 SLC35A4

          8
          0 ENST00000543319.1
          1 ENST00000434981.2
          2 ENST00000323146.3




          cols = [5, 6, 7, 8]
          d = dict.fromkeys(df.columns.difference(cols), 'first')
          d[3] = 'sum'
          print (d)
          {0: 'first', 1: 'first', 2: 'first', 3: 'sum', 4: 'first'}

          df = df.groupby([5, 6, 7, 8], as_index=False).agg(d).reindex(columns=df.columns)
          print (df)
          0 1 2 3 4 5 6 7
          0 chr5 112312630 112312650 31 chr5 112312630 112321662 DCP2
          1 chr5 137676883 137676900 165 chr5 137676883 137676949 FAM53C
          2 chr5 139944400 139944450 169 chr5 139944064 139946344 SLC35A4

          8
          0 ENST00000543319.1
          1 ENST00000434981.2
          2 ENST00000323146.3






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Jan 2 at 13:13

























          answered Jan 2 at 13:07









          jezraeljezrael

          350k26313388




          350k26313388























              0














              Try this:



              df.groupby(['column'])[['another column']].sum()


              It groups by column and add sum of another column.
              I used so that you understand you can group by multiple columns, like this:



              df.groupby(['column1', 'column2'])





              share|improve this answer




























                0














                Try this:



                df.groupby(['column'])[['another column']].sum()


                It groups by column and add sum of another column.
                I used so that you understand you can group by multiple columns, like this:



                df.groupby(['column1', 'column2'])





                share|improve this answer


























                  0












                  0








                  0







                  Try this:



                  df.groupby(['column'])[['another column']].sum()


                  It groups by column and add sum of another column.
                  I used so that you understand you can group by multiple columns, like this:



                  df.groupby(['column1', 'column2'])





                  share|improve this answer













                  Try this:



                  df.groupby(['column'])[['another column']].sum()


                  It groups by column and add sum of another column.
                  I used so that you understand you can group by multiple columns, like this:



                  df.groupby(['column1', 'column2'])






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Jan 2 at 12:10









                  Mehrdad PedramfarMehrdad Pedramfar

                  6,33411643




                  6,33411643























                      0














                      Input Dataframe: considering only first 3 rows,



                      data = {'col1': ['chr5', 'chr5', 'chr5'],
                      'col2': [112312630,137676883,137676900],
                      'col3': [112312650,137676900,137676949],
                      'col4': [31, 123,42],
                      'col5': ['chr5', 'chr5', 'chr5'],
                      'col6': [112312630 ,137676883 ,137676883 ],
                      'col7': [112321662, 137676949, 137676949],
                      'col8': ['DCP2', 'FAM53C', 'FAM53C'],
                      'col9': ['ENST00000543319.1', 'ENST00000434981.2', 'ENST00000434981.2']
                      }

                      df = pd.DataFrame(data = data)
                      df


                      Do like this,



                      cols = ['col5', 'col6', 'col7', 'col8']
                      col_sum = df.groupby(cols)['col4'].sum()
                      col_sum


                      Output: this is a multi-level dataframe. Last column is your output,



                      col5  col6       col7       col8  
                      chr5 112312630 112321662 DCP2 31
                      137676883 137676949 FAM53C 165





                      share|improve this answer


























                      • this returns : 2994450 2994675 RHNO1 ENST00000461997.2 chr12 7155218 7155278 CTDNEP1 ENST00000318988.6 chr17chr17 9286722 9286821 DENND5A ENST00000328194.3 chr11

                        – user10657934
                        Jan 2 at 12:15











                      • @user10657934 provide your dataset file so I can resolve that issue.

                        – Abdur Rehman
                        Jan 2 at 12:17











                      • the example in the question is exactly a part of my file

                        – user10657934
                        Jan 2 at 12:17











                      • @user10657934 let me make dataframe from this and will give you a code example in a while.

                        – Abdur Rehman
                        Jan 2 at 12:19











                      • @user10657934 char5 is a data column or index?

                        – Abdur Rehman
                        Jan 2 at 12:20
















                      0














                      Input Dataframe: considering only first 3 rows,



                      data = {'col1': ['chr5', 'chr5', 'chr5'],
                      'col2': [112312630,137676883,137676900],
                      'col3': [112312650,137676900,137676949],
                      'col4': [31, 123,42],
                      'col5': ['chr5', 'chr5', 'chr5'],
                      'col6': [112312630 ,137676883 ,137676883 ],
                      'col7': [112321662, 137676949, 137676949],
                      'col8': ['DCP2', 'FAM53C', 'FAM53C'],
                      'col9': ['ENST00000543319.1', 'ENST00000434981.2', 'ENST00000434981.2']
                      }

                      df = pd.DataFrame(data = data)
                      df


                      Do like this,



                      cols = ['col5', 'col6', 'col7', 'col8']
                      col_sum = df.groupby(cols)['col4'].sum()
                      col_sum


                      Output: this is a multi-level dataframe. Last column is your output,



                      col5  col6       col7       col8  
                      chr5 112312630 112321662 DCP2 31
                      137676883 137676949 FAM53C 165





                      share|improve this answer


























                      • this returns : 2994450 2994675 RHNO1 ENST00000461997.2 chr12 7155218 7155278 CTDNEP1 ENST00000318988.6 chr17chr17 9286722 9286821 DENND5A ENST00000328194.3 chr11

                        – user10657934
                        Jan 2 at 12:15











                      • @user10657934 provide your dataset file so I can resolve that issue.

                        – Abdur Rehman
                        Jan 2 at 12:17











                      • the example in the question is exactly a part of my file

                        – user10657934
                        Jan 2 at 12:17











                      • @user10657934 let me make dataframe from this and will give you a code example in a while.

                        – Abdur Rehman
                        Jan 2 at 12:19











                      • @user10657934 char5 is a data column or index?

                        – Abdur Rehman
                        Jan 2 at 12:20














                      0












                      0








                      0







                      Input Dataframe: considering only first 3 rows,



                      data = {'col1': ['chr5', 'chr5', 'chr5'],
                      'col2': [112312630,137676883,137676900],
                      'col3': [112312650,137676900,137676949],
                      'col4': [31, 123,42],
                      'col5': ['chr5', 'chr5', 'chr5'],
                      'col6': [112312630 ,137676883 ,137676883 ],
                      'col7': [112321662, 137676949, 137676949],
                      'col8': ['DCP2', 'FAM53C', 'FAM53C'],
                      'col9': ['ENST00000543319.1', 'ENST00000434981.2', 'ENST00000434981.2']
                      }

                      df = pd.DataFrame(data = data)
                      df


                      Do like this,



                      cols = ['col5', 'col6', 'col7', 'col8']
                      col_sum = df.groupby(cols)['col4'].sum()
                      col_sum


                      Output: this is a multi-level dataframe. Last column is your output,



                      col5  col6       col7       col8  
                      chr5 112312630 112321662 DCP2 31
                      137676883 137676949 FAM53C 165





                      share|improve this answer















                      Input Dataframe: considering only first 3 rows,



                      data = {'col1': ['chr5', 'chr5', 'chr5'],
                      'col2': [112312630,137676883,137676900],
                      'col3': [112312650,137676900,137676949],
                      'col4': [31, 123,42],
                      'col5': ['chr5', 'chr5', 'chr5'],
                      'col6': [112312630 ,137676883 ,137676883 ],
                      'col7': [112321662, 137676949, 137676949],
                      'col8': ['DCP2', 'FAM53C', 'FAM53C'],
                      'col9': ['ENST00000543319.1', 'ENST00000434981.2', 'ENST00000434981.2']
                      }

                      df = pd.DataFrame(data = data)
                      df


                      Do like this,



                      cols = ['col5', 'col6', 'col7', 'col8']
                      col_sum = df.groupby(cols)['col4'].sum()
                      col_sum


                      Output: this is a multi-level dataframe. Last column is your output,



                      col5  col6       col7       col8  
                      chr5 112312630 112321662 DCP2 31
                      137676883 137676949 FAM53C 165






                      share|improve this answer














                      share|improve this answer



                      share|improve this answer








                      edited Jan 2 at 12:30

























                      answered Jan 2 at 12:12









                      Abdur RehmanAbdur Rehman

                      626511




                      626511













                      • this returns : 2994450 2994675 RHNO1 ENST00000461997.2 chr12 7155218 7155278 CTDNEP1 ENST00000318988.6 chr17chr17 9286722 9286821 DENND5A ENST00000328194.3 chr11

                        – user10657934
                        Jan 2 at 12:15











                      • @user10657934 provide your dataset file so I can resolve that issue.

                        – Abdur Rehman
                        Jan 2 at 12:17











                      • the example in the question is exactly a part of my file

                        – user10657934
                        Jan 2 at 12:17











                      • @user10657934 let me make dataframe from this and will give you a code example in a while.

                        – Abdur Rehman
                        Jan 2 at 12:19











                      • @user10657934 char5 is a data column or index?

                        – Abdur Rehman
                        Jan 2 at 12:20



















                      • this returns : 2994450 2994675 RHNO1 ENST00000461997.2 chr12 7155218 7155278 CTDNEP1 ENST00000318988.6 chr17chr17 9286722 9286821 DENND5A ENST00000328194.3 chr11

                        – user10657934
                        Jan 2 at 12:15











                      • @user10657934 provide your dataset file so I can resolve that issue.

                        – Abdur Rehman
                        Jan 2 at 12:17











                      • the example in the question is exactly a part of my file

                        – user10657934
                        Jan 2 at 12:17











                      • @user10657934 let me make dataframe from this and will give you a code example in a while.

                        – Abdur Rehman
                        Jan 2 at 12:19











                      • @user10657934 char5 is a data column or index?

                        – Abdur Rehman
                        Jan 2 at 12:20

















                      this returns : 2994450 2994675 RHNO1 ENST00000461997.2 chr12 7155218 7155278 CTDNEP1 ENST00000318988.6 chr17chr17 9286722 9286821 DENND5A ENST00000328194.3 chr11

                      – user10657934
                      Jan 2 at 12:15





                      this returns : 2994450 2994675 RHNO1 ENST00000461997.2 chr12 7155218 7155278 CTDNEP1 ENST00000318988.6 chr17chr17 9286722 9286821 DENND5A ENST00000328194.3 chr11

                      – user10657934
                      Jan 2 at 12:15













                      @user10657934 provide your dataset file so I can resolve that issue.

                      – Abdur Rehman
                      Jan 2 at 12:17





                      @user10657934 provide your dataset file so I can resolve that issue.

                      – Abdur Rehman
                      Jan 2 at 12:17













                      the example in the question is exactly a part of my file

                      – user10657934
                      Jan 2 at 12:17





                      the example in the question is exactly a part of my file

                      – user10657934
                      Jan 2 at 12:17













                      @user10657934 let me make dataframe from this and will give you a code example in a while.

                      – Abdur Rehman
                      Jan 2 at 12:19





                      @user10657934 let me make dataframe from this and will give you a code example in a while.

                      – Abdur Rehman
                      Jan 2 at 12:19













                      @user10657934 char5 is a data column or index?

                      – Abdur Rehman
                      Jan 2 at 12:20





                      @user10657934 char5 is a data column or index?

                      – Abdur Rehman
                      Jan 2 at 12:20


















                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54006086%2fhow-to-group-the-rows-and-sum-the-values-in-one-column-in-python%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      MongoDB - Not Authorized To Execute Command

                      How to fix TextFormField cause rebuild widget in Flutter

                      Npm cannot find a required file even through it is in the searched directory