Subtract a batch of columns in pandas

I am transitioning to using pandas for handling my csv datasets. I am currently trying to do in pandas what I was already doing very easily in numpy: subtract a group of columns from another group several times. This is effectively a element-wise matrix subtraction.

Just for reference, this used to be my numpy solution for this

def subtract_baseline(data, baseline_columns, features_columns):

    """Takes in a list of baseline columns and feature columns, and subtracts the baseline values from all features"""

    assert len(features_columns)%len(baseline_columns)==0, "The number of feature columns is not divisible by baseline columns"

    num_blocks = len(features_columns)/len(baseline_columns)    

    block_size = len(baseline_columns)                         

    for i in range(num_blocks):

        #Grab each feature block and subract the baseline

        init_col = block_size*i+features_columns[0]

        final_col = init_col+block_size

        data[:, init_col:final_col] = numpy.subtract(data[:, init_col:final_col], data[:,baseline_columns])

    return data

To ilustrate better, we can create the following toy dataset:

data = [[10,11,12,13,1,10],[20,21,22,23,1,10],[30,31,32,33,1,10],[40,41,42,43,1,10],[50,51,52,53,1,10],[60,61,62,63,1,10]]

df = pd.DataFrame(data,columns=['L1P1','L1P2','L2P1','L2P2','BP1','BP2'],dtype=float)



   L1P1  L1P2  L2P1  L2P2   BP1   BP2

0  10.0  11.0  12.0  13.0   1.0  10.0

1  20.0  21.0  22.0  23.0   1.0  10.0

2  30.0  31.0  32.0  33.0   1.0  10.0

3  40.0  41.0  42.0  43.0   1.0  10.0

4  50.0  51.0  52.0  53.0   1.0  10.0

5  60.0  61.0  62.0  63.0   1.0  10.0

The correct output would be the result of grabbing the values in L1P1 & L1P2 and subtracting G1P1 & G1P2 (AKA the baseline), then doing it again for L2P1, L2P2 and any other columns there might be (this is what my for loop does in the original function).

   L1P1  L1P2  L2P1  L2P2   BP1   BP2

0   9.0   1.0  11.0   3.0   1.0  10.0

1  19.0  11.0  21.0  13.0   1.0  10.0

2  29.0  21.0  31.0  23.0   1.0  10.0

3  39.0  31.0  41.0  33.0   1.0  10.0

4  49.0  41.0  51.0  43.0   1.0  10.0

5  59.0  51.0  61.0  53.0   1.0  10.0

Note that labels for the dataframe should not change, and ideally I'd want a method that relies on the columns indexes, not labels, because the actual data block is 30 columns, not 2 like in this example. This is how my original function in numpy worked, the parameters baseline_columns and features_columns were just lists of the columns indexes.

After this the baseline columns would be deleted all together from the dataframe, as their function has already been fulfilled.

I tried doing this for just 1 batch using iloc but I get Nan values

df.iloc[:,[0,1]] = df.iloc[:,[0,1]] - df.iloc[:,[4,5]]



   L1P1  L1P2  L2P1  L2P2  G1P1  G1P2

0   NaN   NaN  12.0  13.0   1.0  10.0

1   NaN   NaN  22.0  23.0   1.0  10.0

2   NaN   NaN  32.0  33.0   1.0  10.0

3   NaN   NaN  42.0  43.0   1.0  10.0

4   NaN   NaN  52.0  53.0   1.0  10.0

5   NaN   NaN  62.0  63.0   1.0  10.0

edited Nov 20 '18 at 22:34

asked Nov 20 '18 at 22:27

Pedro Piacenza

add a comment |

Just for reference, this used to be my numpy solution for this

def subtract_baseline(data, baseline_columns, features_columns):

    """Takes in a list of baseline columns and feature columns, and subtracts the baseline values from all features"""

    assert len(features_columns)%len(baseline_columns)==0, "The number of feature columns is not divisible by baseline columns"

    num_blocks = len(features_columns)/len(baseline_columns)    

    block_size = len(baseline_columns)                         

    for i in range(num_blocks):

        #Grab each feature block and subract the baseline

        init_col = block_size*i+features_columns[0]

        final_col = init_col+block_size

        data[:, init_col:final_col] = numpy.subtract(data[:, init_col:final_col], data[:,baseline_columns])

    return data

To ilustrate better, we can create the following toy dataset:

data = [[10,11,12,13,1,10],[20,21,22,23,1,10],[30,31,32,33,1,10],[40,41,42,43,1,10],[50,51,52,53,1,10],[60,61,62,63,1,10]]

df = pd.DataFrame(data,columns=['L1P1','L1P2','L2P1','L2P2','BP1','BP2'],dtype=float)



   L1P1  L1P2  L2P1  L2P2   BP1   BP2

0  10.0  11.0  12.0  13.0   1.0  10.0

1  20.0  21.0  22.0  23.0   1.0  10.0

2  30.0  31.0  32.0  33.0   1.0  10.0

3  40.0  41.0  42.0  43.0   1.0  10.0

4  50.0  51.0  52.0  53.0   1.0  10.0

5  60.0  61.0  62.0  63.0   1.0  10.0

   L1P1  L1P2  L2P1  L2P2   BP1   BP2

0   9.0   1.0  11.0   3.0   1.0  10.0

1  19.0  11.0  21.0  13.0   1.0  10.0

2  29.0  21.0  31.0  23.0   1.0  10.0

3  39.0  31.0  41.0  33.0   1.0  10.0

4  49.0  41.0  51.0  43.0   1.0  10.0

5  59.0  51.0  61.0  53.0   1.0  10.0

After this the baseline columns would be deleted all together from the dataframe, as their function has already been fulfilled.

I tried doing this for just 1 batch using iloc but I get Nan values

df.iloc[:,[0,1]] = df.iloc[:,[0,1]] - df.iloc[:,[4,5]]



   L1P1  L1P2  L2P1  L2P2  G1P1  G1P2

0   NaN   NaN  12.0  13.0   1.0  10.0

1   NaN   NaN  22.0  23.0   1.0  10.0

2   NaN   NaN  32.0  33.0   1.0  10.0

3   NaN   NaN  42.0  43.0   1.0  10.0

4   NaN   NaN  52.0  53.0   1.0  10.0

5   NaN   NaN  62.0  63.0   1.0  10.0

edited Nov 20 '18 at 22:34

asked Nov 20 '18 at 22:27

Pedro Piacenza

add a comment |

Just for reference, this used to be my numpy solution for this

def subtract_baseline(data, baseline_columns, features_columns):

    """Takes in a list of baseline columns and feature columns, and subtracts the baseline values from all features"""

    assert len(features_columns)%len(baseline_columns)==0, "The number of feature columns is not divisible by baseline columns"

    num_blocks = len(features_columns)/len(baseline_columns)    

    block_size = len(baseline_columns)                         

    for i in range(num_blocks):

        #Grab each feature block and subract the baseline

        init_col = block_size*i+features_columns[0]

        final_col = init_col+block_size

        data[:, init_col:final_col] = numpy.subtract(data[:, init_col:final_col], data[:,baseline_columns])

    return data

To ilustrate better, we can create the following toy dataset:

data = [[10,11,12,13,1,10],[20,21,22,23,1,10],[30,31,32,33,1,10],[40,41,42,43,1,10],[50,51,52,53,1,10],[60,61,62,63,1,10]]

df = pd.DataFrame(data,columns=['L1P1','L1P2','L2P1','L2P2','BP1','BP2'],dtype=float)



   L1P1  L1P2  L2P1  L2P2   BP1   BP2

0  10.0  11.0  12.0  13.0   1.0  10.0

1  20.0  21.0  22.0  23.0   1.0  10.0

2  30.0  31.0  32.0  33.0   1.0  10.0

3  40.0  41.0  42.0  43.0   1.0  10.0

4  50.0  51.0  52.0  53.0   1.0  10.0

5  60.0  61.0  62.0  63.0   1.0  10.0

   L1P1  L1P2  L2P1  L2P2   BP1   BP2

0   9.0   1.0  11.0   3.0   1.0  10.0

1  19.0  11.0  21.0  13.0   1.0  10.0

2  29.0  21.0  31.0  23.0   1.0  10.0

3  39.0  31.0  41.0  33.0   1.0  10.0

4  49.0  41.0  51.0  43.0   1.0  10.0

5  59.0  51.0  61.0  53.0   1.0  10.0

After this the baseline columns would be deleted all together from the dataframe, as their function has already been fulfilled.

I tried doing this for just 1 batch using iloc but I get Nan values

df.iloc[:,[0,1]] = df.iloc[:,[0,1]] - df.iloc[:,[4,5]]



   L1P1  L1P2  L2P1  L2P2  G1P1  G1P2

0   NaN   NaN  12.0  13.0   1.0  10.0

1   NaN   NaN  22.0  23.0   1.0  10.0

2   NaN   NaN  32.0  33.0   1.0  10.0

3   NaN   NaN  42.0  43.0   1.0  10.0

4   NaN   NaN  52.0  53.0   1.0  10.0

5   NaN   NaN  62.0  63.0   1.0  10.0

edited Nov 20 '18 at 22:34

asked Nov 20 '18 at 22:27

Pedro Piacenza

Just for reference, this used to be my numpy solution for this

def subtract_baseline(data, baseline_columns, features_columns):

    """Takes in a list of baseline columns and feature columns, and subtracts the baseline values from all features"""

    assert len(features_columns)%len(baseline_columns)==0, "The number of feature columns is not divisible by baseline columns"

    num_blocks = len(features_columns)/len(baseline_columns)    

    block_size = len(baseline_columns)                         

    for i in range(num_blocks):

        #Grab each feature block and subract the baseline

        init_col = block_size*i+features_columns[0]

        final_col = init_col+block_size

        data[:, init_col:final_col] = numpy.subtract(data[:, init_col:final_col], data[:,baseline_columns])

    return data

To ilustrate better, we can create the following toy dataset:

data = [[10,11,12,13,1,10],[20,21,22,23,1,10],[30,31,32,33,1,10],[40,41,42,43,1,10],[50,51,52,53,1,10],[60,61,62,63,1,10]]

df = pd.DataFrame(data,columns=['L1P1','L1P2','L2P1','L2P2','BP1','BP2'],dtype=float)



   L1P1  L1P2  L2P1  L2P2   BP1   BP2

0  10.0  11.0  12.0  13.0   1.0  10.0

1  20.0  21.0  22.0  23.0   1.0  10.0

2  30.0  31.0  32.0  33.0   1.0  10.0

3  40.0  41.0  42.0  43.0   1.0  10.0

4  50.0  51.0  52.0  53.0   1.0  10.0

5  60.0  61.0  62.0  63.0   1.0  10.0

   L1P1  L1P2  L2P1  L2P2   BP1   BP2

0   9.0   1.0  11.0   3.0   1.0  10.0

1  19.0  11.0  21.0  13.0   1.0  10.0

2  29.0  21.0  31.0  23.0   1.0  10.0

3  39.0  31.0  41.0  33.0   1.0  10.0

4  49.0  41.0  51.0  43.0   1.0  10.0

5  59.0  51.0  61.0  53.0   1.0  10.0

After this the baseline columns would be deleted all together from the dataframe, as their function has already been fulfilled.

I tried doing this for just 1 batch using iloc but I get Nan values

df.iloc[:,[0,1]] = df.iloc[:,[0,1]] - df.iloc[:,[4,5]]



   L1P1  L1P2  L2P1  L2P2  G1P1  G1P2

0   NaN   NaN  12.0  13.0   1.0  10.0

1   NaN   NaN  22.0  23.0   1.0  10.0

2   NaN   NaN  32.0  33.0   1.0  10.0

3   NaN   NaN  42.0  43.0   1.0  10.0

4   NaN   NaN  52.0  53.0   1.0  10.0

5   NaN   NaN  62.0  63.0   1.0  10.0

python pandas numpy

edited Nov 20 '18 at 22:34

asked Nov 20 '18 at 22:27

Pedro Piacenza

edited Nov 20 '18 at 22:34

asked Nov 20 '18 at 22:27

Pedro Piacenza

edited Nov 20 '18 at 22:34

asked Nov 20 '18 at 22:27

Pedro Piacenza

asked Nov 20 '18 at 22:27

Pedro Piacenza

asked Nov 20 '18 at 22:27

Pedro Piacenza

add a comment |

2 Answers
2

active

oldest

votes

Adding .values at the end , pandas dataframe will search the column and index match to do the subtract , since the column is not match for 0,1 and 4,5 it will return NaN

df.iloc[:,[0,1]]=df.iloc[:,[0,1]].values - df.iloc[:,[4,5]].values

df

Out[176]: 

   L1P1  L1P2  L2P1  L2P2  BP1   BP2

0   9.0   1.0  12.0  13.0  1.0  10.0

1  19.0  11.0  22.0  23.0  1.0  10.0

2  29.0  21.0  32.0  33.0  1.0  10.0

3  39.0  31.0  42.0  43.0  1.0  10.0

4  49.0  41.0  52.0  53.0  1.0  10.0

5  59.0  51.0  62.0  63.0  1.0  10.0

answered Nov 20 '18 at 23:23

Wen-Ben

109k83265

1

Worked like I needed once I replaced the hardcoded columns to ranges! thanks!

– Pedro Piacenza
Nov 22 '18 at 19:36

add a comment |

Is there a reason you want to do it in one line? I.e. would it be okay for your purposes to do it with two lines:

df.iloc[:,0] = df.iloc[:,0] - df.iloc[:,4]

df.iloc[:,1] = df.iloc[:,1] - df.iloc[:,5]

These two lines achieve what I think is your intent.

edited Nov 20 '18 at 23:19

answered Nov 20 '18 at 23:09

skumhest

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53402527%2fsubtract-a-batch-of-columns-in-pandas%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

Adding .values at the end , pandas dataframe will search the column and index match to do the subtract , since the column is not match for 0,1 and 4,5 it will return NaN

df.iloc[:,[0,1]]=df.iloc[:,[0,1]].values - df.iloc[:,[4,5]].values

df

Out[176]: 

   L1P1  L1P2  L2P1  L2P2  BP1   BP2

0   9.0   1.0  12.0  13.0  1.0  10.0

1  19.0  11.0  22.0  23.0  1.0  10.0

2  29.0  21.0  32.0  33.0  1.0  10.0

3  39.0  31.0  42.0  43.0  1.0  10.0

4  49.0  41.0  52.0  53.0  1.0  10.0

5  59.0  51.0  62.0  63.0  1.0  10.0

answered Nov 20 '18 at 23:23

Wen-Ben

109k83265

1

Worked like I needed once I replaced the hardcoded columns to ranges! thanks!

– Pedro Piacenza
Nov 22 '18 at 19:36

add a comment |

Adding .values at the end , pandas dataframe will search the column and index match to do the subtract , since the column is not match for 0,1 and 4,5 it will return NaN

df.iloc[:,[0,1]]=df.iloc[:,[0,1]].values - df.iloc[:,[4,5]].values

df

Out[176]: 

   L1P1  L1P2  L2P1  L2P2  BP1   BP2

0   9.0   1.0  12.0  13.0  1.0  10.0

1  19.0  11.0  22.0  23.0  1.0  10.0

2  29.0  21.0  32.0  33.0  1.0  10.0

3  39.0  31.0  42.0  43.0  1.0  10.0

4  49.0  41.0  52.0  53.0  1.0  10.0

5  59.0  51.0  62.0  63.0  1.0  10.0

answered Nov 20 '18 at 23:23

Wen-Ben

109k83265

1

Worked like I needed once I replaced the hardcoded columns to ranges! thanks!

– Pedro Piacenza
Nov 22 '18 at 19:36

add a comment |

Adding .values at the end , pandas dataframe will search the column and index match to do the subtract , since the column is not match for 0,1 and 4,5 it will return NaN

df.iloc[:,[0,1]]=df.iloc[:,[0,1]].values - df.iloc[:,[4,5]].values

df

Out[176]: 

   L1P1  L1P2  L2P1  L2P2  BP1   BP2

0   9.0   1.0  12.0  13.0  1.0  10.0

1  19.0  11.0  22.0  23.0  1.0  10.0

2  29.0  21.0  32.0  33.0  1.0  10.0

3  39.0  31.0  42.0  43.0  1.0  10.0

4  49.0  41.0  52.0  53.0  1.0  10.0

5  59.0  51.0  62.0  63.0  1.0  10.0

answered Nov 20 '18 at 23:23

Wen-Ben

109k83265

Adding .values at the end , pandas dataframe will search the column and index match to do the subtract , since the column is not match for 0,1 and 4,5 it will return NaN

df.iloc[:,[0,1]]=df.iloc[:,[0,1]].values - df.iloc[:,[4,5]].values

df

Out[176]: 

   L1P1  L1P2  L2P1  L2P2  BP1   BP2

0   9.0   1.0  12.0  13.0  1.0  10.0

1  19.0  11.0  22.0  23.0  1.0  10.0

2  29.0  21.0  32.0  33.0  1.0  10.0

3  39.0  31.0  42.0  43.0  1.0  10.0

4  49.0  41.0  52.0  53.0  1.0  10.0

5  59.0  51.0  62.0  63.0  1.0  10.0

answered Nov 20 '18 at 23:23

Wen-Ben

109k83265

answered Nov 20 '18 at 23:23

Wen-Ben

109k83265

answered Nov 20 '18 at 23:23

Wen-Ben

109k83265

answered Nov 20 '18 at 23:23

Wen-Ben

109k83265

1

Worked like I needed once I replaced the hardcoded columns to ranges! thanks!

– Pedro Piacenza
Nov 22 '18 at 19:36

add a comment |

1

Worked like I needed once I replaced the hardcoded columns to ranges! thanks!

– Pedro Piacenza
Nov 22 '18 at 19:36

Worked like I needed once I replaced the hardcoded columns to ranges! thanks!

– Pedro Piacenza
Nov 22 '18 at 19:36

add a comment |

Is there a reason you want to do it in one line? I.e. would it be okay for your purposes to do it with two lines:

df.iloc[:,0] = df.iloc[:,0] - df.iloc[:,4]

df.iloc[:,1] = df.iloc[:,1] - df.iloc[:,5]

These two lines achieve what I think is your intent.

edited Nov 20 '18 at 23:19

answered Nov 20 '18 at 23:09

skumhest

add a comment |

Is there a reason you want to do it in one line? I.e. would it be okay for your purposes to do it with two lines:

df.iloc[:,0] = df.iloc[:,0] - df.iloc[:,4]

df.iloc[:,1] = df.iloc[:,1] - df.iloc[:,5]

These two lines achieve what I think is your intent.

edited Nov 20 '18 at 23:19

answered Nov 20 '18 at 23:09

skumhest

add a comment |

Is there a reason you want to do it in one line? I.e. would it be okay for your purposes to do it with two lines:

df.iloc[:,0] = df.iloc[:,0] - df.iloc[:,4]

df.iloc[:,1] = df.iloc[:,1] - df.iloc[:,5]

These two lines achieve what I think is your intent.

edited Nov 20 '18 at 23:19

answered Nov 20 '18 at 23:09

skumhest

Is there a reason you want to do it in one line? I.e. would it be okay for your purposes to do it with two lines:

df.iloc[:,0] = df.iloc[:,0] - df.iloc[:,4]

df.iloc[:,1] = df.iloc[:,1] - df.iloc[:,5]

These two lines achieve what I think is your intent.

edited Nov 20 '18 at 23:19

answered Nov 20 '18 at 23:09

skumhest

edited Nov 20 '18 at 23:19

answered Nov 20 '18 at 23:09

skumhest

answered Nov 20 '18 at 23:09

skumhest

answered Nov 20 '18 at 23:09

skumhest

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

Search This Blog

Ufyukyu