Subtract a batch of columns in pandas












0















I am transitioning to using pandas for handling my csv datasets. I am currently trying to do in pandas what I was already doing very easily in numpy: subtract a group of columns from another group several times. This is effectively a element-wise matrix subtraction.



Just for reference, this used to be my numpy solution for this



def subtract_baseline(data, baseline_columns, features_columns):
"""Takes in a list of baseline columns and feature columns, and subtracts the baseline values from all features"""
assert len(features_columns)%len(baseline_columns)==0, "The number of feature columns is not divisible by baseline columns"
num_blocks = len(features_columns)/len(baseline_columns)
block_size = len(baseline_columns)
for i in range(num_blocks):
#Grab each feature block and subract the baseline
init_col = block_size*i+features_columns[0]
final_col = init_col+block_size
data[:, init_col:final_col] = numpy.subtract(data[:, init_col:final_col], data[:,baseline_columns])
return data


To ilustrate better, we can create the following toy dataset:



data = [[10,11,12,13,1,10],[20,21,22,23,1,10],[30,31,32,33,1,10],[40,41,42,43,1,10],[50,51,52,53,1,10],[60,61,62,63,1,10]]
df = pd.DataFrame(data,columns=['L1P1','L1P2','L2P1','L2P2','BP1','BP2'],dtype=float)

L1P1 L1P2 L2P1 L2P2 BP1 BP2
0 10.0 11.0 12.0 13.0 1.0 10.0
1 20.0 21.0 22.0 23.0 1.0 10.0
2 30.0 31.0 32.0 33.0 1.0 10.0
3 40.0 41.0 42.0 43.0 1.0 10.0
4 50.0 51.0 52.0 53.0 1.0 10.0
5 60.0 61.0 62.0 63.0 1.0 10.0


The correct output would be the result of grabbing the values in L1P1 & L1P2 and subtracting G1P1 & G1P2 (AKA the baseline), then doing it again for L2P1, L2P2 and any other columns there might be (this is what my for loop does in the original function).



   L1P1  L1P2  L2P1  L2P2   BP1   BP2
0 9.0 1.0 11.0 3.0 1.0 10.0
1 19.0 11.0 21.0 13.0 1.0 10.0
2 29.0 21.0 31.0 23.0 1.0 10.0
3 39.0 31.0 41.0 33.0 1.0 10.0
4 49.0 41.0 51.0 43.0 1.0 10.0
5 59.0 51.0 61.0 53.0 1.0 10.0


Note that labels for the dataframe should not change, and ideally I'd want a method that relies on the columns indexes, not labels, because the actual data block is 30 columns, not 2 like in this example. This is how my original function in numpy worked, the parameters baseline_columns and features_columns were just lists of the columns indexes.



After this the baseline columns would be deleted all together from the dataframe, as their function has already been fulfilled.



I tried doing this for just 1 batch using iloc but I get Nan values



df.iloc[:,[0,1]] = df.iloc[:,[0,1]] - df.iloc[:,[4,5]]

L1P1 L1P2 L2P1 L2P2 G1P1 G1P2
0 NaN NaN 12.0 13.0 1.0 10.0
1 NaN NaN 22.0 23.0 1.0 10.0
2 NaN NaN 32.0 33.0 1.0 10.0
3 NaN NaN 42.0 43.0 1.0 10.0
4 NaN NaN 52.0 53.0 1.0 10.0
5 NaN NaN 62.0 63.0 1.0 10.0









share|improve this question





























    0















    I am transitioning to using pandas for handling my csv datasets. I am currently trying to do in pandas what I was already doing very easily in numpy: subtract a group of columns from another group several times. This is effectively a element-wise matrix subtraction.



    Just for reference, this used to be my numpy solution for this



    def subtract_baseline(data, baseline_columns, features_columns):
    """Takes in a list of baseline columns and feature columns, and subtracts the baseline values from all features"""
    assert len(features_columns)%len(baseline_columns)==0, "The number of feature columns is not divisible by baseline columns"
    num_blocks = len(features_columns)/len(baseline_columns)
    block_size = len(baseline_columns)
    for i in range(num_blocks):
    #Grab each feature block and subract the baseline
    init_col = block_size*i+features_columns[0]
    final_col = init_col+block_size
    data[:, init_col:final_col] = numpy.subtract(data[:, init_col:final_col], data[:,baseline_columns])
    return data


    To ilustrate better, we can create the following toy dataset:



    data = [[10,11,12,13,1,10],[20,21,22,23,1,10],[30,31,32,33,1,10],[40,41,42,43,1,10],[50,51,52,53,1,10],[60,61,62,63,1,10]]
    df = pd.DataFrame(data,columns=['L1P1','L1P2','L2P1','L2P2','BP1','BP2'],dtype=float)

    L1P1 L1P2 L2P1 L2P2 BP1 BP2
    0 10.0 11.0 12.0 13.0 1.0 10.0
    1 20.0 21.0 22.0 23.0 1.0 10.0
    2 30.0 31.0 32.0 33.0 1.0 10.0
    3 40.0 41.0 42.0 43.0 1.0 10.0
    4 50.0 51.0 52.0 53.0 1.0 10.0
    5 60.0 61.0 62.0 63.0 1.0 10.0


    The correct output would be the result of grabbing the values in L1P1 & L1P2 and subtracting G1P1 & G1P2 (AKA the baseline), then doing it again for L2P1, L2P2 and any other columns there might be (this is what my for loop does in the original function).



       L1P1  L1P2  L2P1  L2P2   BP1   BP2
    0 9.0 1.0 11.0 3.0 1.0 10.0
    1 19.0 11.0 21.0 13.0 1.0 10.0
    2 29.0 21.0 31.0 23.0 1.0 10.0
    3 39.0 31.0 41.0 33.0 1.0 10.0
    4 49.0 41.0 51.0 43.0 1.0 10.0
    5 59.0 51.0 61.0 53.0 1.0 10.0


    Note that labels for the dataframe should not change, and ideally I'd want a method that relies on the columns indexes, not labels, because the actual data block is 30 columns, not 2 like in this example. This is how my original function in numpy worked, the parameters baseline_columns and features_columns were just lists of the columns indexes.



    After this the baseline columns would be deleted all together from the dataframe, as their function has already been fulfilled.



    I tried doing this for just 1 batch using iloc but I get Nan values



    df.iloc[:,[0,1]] = df.iloc[:,[0,1]] - df.iloc[:,[4,5]]

    L1P1 L1P2 L2P1 L2P2 G1P1 G1P2
    0 NaN NaN 12.0 13.0 1.0 10.0
    1 NaN NaN 22.0 23.0 1.0 10.0
    2 NaN NaN 32.0 33.0 1.0 10.0
    3 NaN NaN 42.0 43.0 1.0 10.0
    4 NaN NaN 52.0 53.0 1.0 10.0
    5 NaN NaN 62.0 63.0 1.0 10.0









    share|improve this question



























      0












      0








      0








      I am transitioning to using pandas for handling my csv datasets. I am currently trying to do in pandas what I was already doing very easily in numpy: subtract a group of columns from another group several times. This is effectively a element-wise matrix subtraction.



      Just for reference, this used to be my numpy solution for this



      def subtract_baseline(data, baseline_columns, features_columns):
      """Takes in a list of baseline columns and feature columns, and subtracts the baseline values from all features"""
      assert len(features_columns)%len(baseline_columns)==0, "The number of feature columns is not divisible by baseline columns"
      num_blocks = len(features_columns)/len(baseline_columns)
      block_size = len(baseline_columns)
      for i in range(num_blocks):
      #Grab each feature block and subract the baseline
      init_col = block_size*i+features_columns[0]
      final_col = init_col+block_size
      data[:, init_col:final_col] = numpy.subtract(data[:, init_col:final_col], data[:,baseline_columns])
      return data


      To ilustrate better, we can create the following toy dataset:



      data = [[10,11,12,13,1,10],[20,21,22,23,1,10],[30,31,32,33,1,10],[40,41,42,43,1,10],[50,51,52,53,1,10],[60,61,62,63,1,10]]
      df = pd.DataFrame(data,columns=['L1P1','L1P2','L2P1','L2P2','BP1','BP2'],dtype=float)

      L1P1 L1P2 L2P1 L2P2 BP1 BP2
      0 10.0 11.0 12.0 13.0 1.0 10.0
      1 20.0 21.0 22.0 23.0 1.0 10.0
      2 30.0 31.0 32.0 33.0 1.0 10.0
      3 40.0 41.0 42.0 43.0 1.0 10.0
      4 50.0 51.0 52.0 53.0 1.0 10.0
      5 60.0 61.0 62.0 63.0 1.0 10.0


      The correct output would be the result of grabbing the values in L1P1 & L1P2 and subtracting G1P1 & G1P2 (AKA the baseline), then doing it again for L2P1, L2P2 and any other columns there might be (this is what my for loop does in the original function).



         L1P1  L1P2  L2P1  L2P2   BP1   BP2
      0 9.0 1.0 11.0 3.0 1.0 10.0
      1 19.0 11.0 21.0 13.0 1.0 10.0
      2 29.0 21.0 31.0 23.0 1.0 10.0
      3 39.0 31.0 41.0 33.0 1.0 10.0
      4 49.0 41.0 51.0 43.0 1.0 10.0
      5 59.0 51.0 61.0 53.0 1.0 10.0


      Note that labels for the dataframe should not change, and ideally I'd want a method that relies on the columns indexes, not labels, because the actual data block is 30 columns, not 2 like in this example. This is how my original function in numpy worked, the parameters baseline_columns and features_columns were just lists of the columns indexes.



      After this the baseline columns would be deleted all together from the dataframe, as their function has already been fulfilled.



      I tried doing this for just 1 batch using iloc but I get Nan values



      df.iloc[:,[0,1]] = df.iloc[:,[0,1]] - df.iloc[:,[4,5]]

      L1P1 L1P2 L2P1 L2P2 G1P1 G1P2
      0 NaN NaN 12.0 13.0 1.0 10.0
      1 NaN NaN 22.0 23.0 1.0 10.0
      2 NaN NaN 32.0 33.0 1.0 10.0
      3 NaN NaN 42.0 43.0 1.0 10.0
      4 NaN NaN 52.0 53.0 1.0 10.0
      5 NaN NaN 62.0 63.0 1.0 10.0









      share|improve this question
















      I am transitioning to using pandas for handling my csv datasets. I am currently trying to do in pandas what I was already doing very easily in numpy: subtract a group of columns from another group several times. This is effectively a element-wise matrix subtraction.



      Just for reference, this used to be my numpy solution for this



      def subtract_baseline(data, baseline_columns, features_columns):
      """Takes in a list of baseline columns and feature columns, and subtracts the baseline values from all features"""
      assert len(features_columns)%len(baseline_columns)==0, "The number of feature columns is not divisible by baseline columns"
      num_blocks = len(features_columns)/len(baseline_columns)
      block_size = len(baseline_columns)
      for i in range(num_blocks):
      #Grab each feature block and subract the baseline
      init_col = block_size*i+features_columns[0]
      final_col = init_col+block_size
      data[:, init_col:final_col] = numpy.subtract(data[:, init_col:final_col], data[:,baseline_columns])
      return data


      To ilustrate better, we can create the following toy dataset:



      data = [[10,11,12,13,1,10],[20,21,22,23,1,10],[30,31,32,33,1,10],[40,41,42,43,1,10],[50,51,52,53,1,10],[60,61,62,63,1,10]]
      df = pd.DataFrame(data,columns=['L1P1','L1P2','L2P1','L2P2','BP1','BP2'],dtype=float)

      L1P1 L1P2 L2P1 L2P2 BP1 BP2
      0 10.0 11.0 12.0 13.0 1.0 10.0
      1 20.0 21.0 22.0 23.0 1.0 10.0
      2 30.0 31.0 32.0 33.0 1.0 10.0
      3 40.0 41.0 42.0 43.0 1.0 10.0
      4 50.0 51.0 52.0 53.0 1.0 10.0
      5 60.0 61.0 62.0 63.0 1.0 10.0


      The correct output would be the result of grabbing the values in L1P1 & L1P2 and subtracting G1P1 & G1P2 (AKA the baseline), then doing it again for L2P1, L2P2 and any other columns there might be (this is what my for loop does in the original function).



         L1P1  L1P2  L2P1  L2P2   BP1   BP2
      0 9.0 1.0 11.0 3.0 1.0 10.0
      1 19.0 11.0 21.0 13.0 1.0 10.0
      2 29.0 21.0 31.0 23.0 1.0 10.0
      3 39.0 31.0 41.0 33.0 1.0 10.0
      4 49.0 41.0 51.0 43.0 1.0 10.0
      5 59.0 51.0 61.0 53.0 1.0 10.0


      Note that labels for the dataframe should not change, and ideally I'd want a method that relies on the columns indexes, not labels, because the actual data block is 30 columns, not 2 like in this example. This is how my original function in numpy worked, the parameters baseline_columns and features_columns were just lists of the columns indexes.



      After this the baseline columns would be deleted all together from the dataframe, as their function has already been fulfilled.



      I tried doing this for just 1 batch using iloc but I get Nan values



      df.iloc[:,[0,1]] = df.iloc[:,[0,1]] - df.iloc[:,[4,5]]

      L1P1 L1P2 L2P1 L2P2 G1P1 G1P2
      0 NaN NaN 12.0 13.0 1.0 10.0
      1 NaN NaN 22.0 23.0 1.0 10.0
      2 NaN NaN 32.0 33.0 1.0 10.0
      3 NaN NaN 42.0 43.0 1.0 10.0
      4 NaN NaN 52.0 53.0 1.0 10.0
      5 NaN NaN 62.0 63.0 1.0 10.0






      python pandas numpy






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 20 '18 at 22:34







      Pedro Piacenza

















      asked Nov 20 '18 at 22:27









      Pedro PiacenzaPedro Piacenza

      32




      32
























          2 Answers
          2






          active

          oldest

          votes


















          0














          Adding .values at the end , pandas dataframe will search the column and index match to do the subtract , since the column is not match for 0,1 and 4,5 it will return NaN



          df.iloc[:,[0,1]]=df.iloc[:,[0,1]].values - df.iloc[:,[4,5]].values
          df
          Out[176]:
          L1P1 L1P2 L2P1 L2P2 BP1 BP2
          0 9.0 1.0 12.0 13.0 1.0 10.0
          1 19.0 11.0 22.0 23.0 1.0 10.0
          2 29.0 21.0 32.0 33.0 1.0 10.0
          3 39.0 31.0 42.0 43.0 1.0 10.0
          4 49.0 41.0 52.0 53.0 1.0 10.0
          5 59.0 51.0 62.0 63.0 1.0 10.0





          share|improve this answer



















          • 1





            Worked like I needed once I replaced the hardcoded columns to ranges! thanks!

            – Pedro Piacenza
            Nov 22 '18 at 19:36



















          0














          Is there a reason you want to do it in one line? I.e. would it be okay for your purposes to do it with two lines:



          df.iloc[:,0] = df.iloc[:,0] - df.iloc[:,4]
          df.iloc[:,1] = df.iloc[:,1] - df.iloc[:,5]


          These two lines achieve what I think is your intent.






          share|improve this answer

























            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53402527%2fsubtract-a-batch-of-columns-in-pandas%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            0














            Adding .values at the end , pandas dataframe will search the column and index match to do the subtract , since the column is not match for 0,1 and 4,5 it will return NaN



            df.iloc[:,[0,1]]=df.iloc[:,[0,1]].values - df.iloc[:,[4,5]].values
            df
            Out[176]:
            L1P1 L1P2 L2P1 L2P2 BP1 BP2
            0 9.0 1.0 12.0 13.0 1.0 10.0
            1 19.0 11.0 22.0 23.0 1.0 10.0
            2 29.0 21.0 32.0 33.0 1.0 10.0
            3 39.0 31.0 42.0 43.0 1.0 10.0
            4 49.0 41.0 52.0 53.0 1.0 10.0
            5 59.0 51.0 62.0 63.0 1.0 10.0





            share|improve this answer



















            • 1





              Worked like I needed once I replaced the hardcoded columns to ranges! thanks!

              – Pedro Piacenza
              Nov 22 '18 at 19:36
















            0














            Adding .values at the end , pandas dataframe will search the column and index match to do the subtract , since the column is not match for 0,1 and 4,5 it will return NaN



            df.iloc[:,[0,1]]=df.iloc[:,[0,1]].values - df.iloc[:,[4,5]].values
            df
            Out[176]:
            L1P1 L1P2 L2P1 L2P2 BP1 BP2
            0 9.0 1.0 12.0 13.0 1.0 10.0
            1 19.0 11.0 22.0 23.0 1.0 10.0
            2 29.0 21.0 32.0 33.0 1.0 10.0
            3 39.0 31.0 42.0 43.0 1.0 10.0
            4 49.0 41.0 52.0 53.0 1.0 10.0
            5 59.0 51.0 62.0 63.0 1.0 10.0





            share|improve this answer



















            • 1





              Worked like I needed once I replaced the hardcoded columns to ranges! thanks!

              – Pedro Piacenza
              Nov 22 '18 at 19:36














            0












            0








            0







            Adding .values at the end , pandas dataframe will search the column and index match to do the subtract , since the column is not match for 0,1 and 4,5 it will return NaN



            df.iloc[:,[0,1]]=df.iloc[:,[0,1]].values - df.iloc[:,[4,5]].values
            df
            Out[176]:
            L1P1 L1P2 L2P1 L2P2 BP1 BP2
            0 9.0 1.0 12.0 13.0 1.0 10.0
            1 19.0 11.0 22.0 23.0 1.0 10.0
            2 29.0 21.0 32.0 33.0 1.0 10.0
            3 39.0 31.0 42.0 43.0 1.0 10.0
            4 49.0 41.0 52.0 53.0 1.0 10.0
            5 59.0 51.0 62.0 63.0 1.0 10.0





            share|improve this answer













            Adding .values at the end , pandas dataframe will search the column and index match to do the subtract , since the column is not match for 0,1 and 4,5 it will return NaN



            df.iloc[:,[0,1]]=df.iloc[:,[0,1]].values - df.iloc[:,[4,5]].values
            df
            Out[176]:
            L1P1 L1P2 L2P1 L2P2 BP1 BP2
            0 9.0 1.0 12.0 13.0 1.0 10.0
            1 19.0 11.0 22.0 23.0 1.0 10.0
            2 29.0 21.0 32.0 33.0 1.0 10.0
            3 39.0 31.0 42.0 43.0 1.0 10.0
            4 49.0 41.0 52.0 53.0 1.0 10.0
            5 59.0 51.0 62.0 63.0 1.0 10.0






            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Nov 20 '18 at 23:23









            Wen-BenWen-Ben

            109k83265




            109k83265








            • 1





              Worked like I needed once I replaced the hardcoded columns to ranges! thanks!

              – Pedro Piacenza
              Nov 22 '18 at 19:36














            • 1





              Worked like I needed once I replaced the hardcoded columns to ranges! thanks!

              – Pedro Piacenza
              Nov 22 '18 at 19:36








            1




            1





            Worked like I needed once I replaced the hardcoded columns to ranges! thanks!

            – Pedro Piacenza
            Nov 22 '18 at 19:36





            Worked like I needed once I replaced the hardcoded columns to ranges! thanks!

            – Pedro Piacenza
            Nov 22 '18 at 19:36













            0














            Is there a reason you want to do it in one line? I.e. would it be okay for your purposes to do it with two lines:



            df.iloc[:,0] = df.iloc[:,0] - df.iloc[:,4]
            df.iloc[:,1] = df.iloc[:,1] - df.iloc[:,5]


            These two lines achieve what I think is your intent.






            share|improve this answer






























              0














              Is there a reason you want to do it in one line? I.e. would it be okay for your purposes to do it with two lines:



              df.iloc[:,0] = df.iloc[:,0] - df.iloc[:,4]
              df.iloc[:,1] = df.iloc[:,1] - df.iloc[:,5]


              These two lines achieve what I think is your intent.






              share|improve this answer




























                0












                0








                0







                Is there a reason you want to do it in one line? I.e. would it be okay for your purposes to do it with two lines:



                df.iloc[:,0] = df.iloc[:,0] - df.iloc[:,4]
                df.iloc[:,1] = df.iloc[:,1] - df.iloc[:,5]


                These two lines achieve what I think is your intent.






                share|improve this answer















                Is there a reason you want to do it in one line? I.e. would it be okay for your purposes to do it with two lines:



                df.iloc[:,0] = df.iloc[:,0] - df.iloc[:,4]
                df.iloc[:,1] = df.iloc[:,1] - df.iloc[:,5]


                These two lines achieve what I think is your intent.







                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Nov 20 '18 at 23:19

























                answered Nov 20 '18 at 23:09









                skumhestskumhest

                54




                54






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53402527%2fsubtract-a-batch-of-columns-in-pandas%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    MongoDB - Not Authorized To Execute Command

                    How to fix TextFormField cause rebuild widget in Flutter

                    Npm cannot find a required file even through it is in the searched directory