Subtract a batch of columns in pandas
I am transitioning to using pandas for handling my csv datasets. I am currently trying to do in pandas what I was already doing very easily in numpy: subtract a group of columns from another group several times. This is effectively a element-wise matrix subtraction.
Just for reference, this used to be my numpy solution for this
def subtract_baseline(data, baseline_columns, features_columns):
"""Takes in a list of baseline columns and feature columns, and subtracts the baseline values from all features"""
assert len(features_columns)%len(baseline_columns)==0, "The number of feature columns is not divisible by baseline columns"
num_blocks = len(features_columns)/len(baseline_columns)
block_size = len(baseline_columns)
for i in range(num_blocks):
#Grab each feature block and subract the baseline
init_col = block_size*i+features_columns[0]
final_col = init_col+block_size
data[:, init_col:final_col] = numpy.subtract(data[:, init_col:final_col], data[:,baseline_columns])
return data
To ilustrate better, we can create the following toy dataset:
data = [[10,11,12,13,1,10],[20,21,22,23,1,10],[30,31,32,33,1,10],[40,41,42,43,1,10],[50,51,52,53,1,10],[60,61,62,63,1,10]]
df = pd.DataFrame(data,columns=['L1P1','L1P2','L2P1','L2P2','BP1','BP2'],dtype=float)
L1P1 L1P2 L2P1 L2P2 BP1 BP2
0 10.0 11.0 12.0 13.0 1.0 10.0
1 20.0 21.0 22.0 23.0 1.0 10.0
2 30.0 31.0 32.0 33.0 1.0 10.0
3 40.0 41.0 42.0 43.0 1.0 10.0
4 50.0 51.0 52.0 53.0 1.0 10.0
5 60.0 61.0 62.0 63.0 1.0 10.0
The correct output would be the result of grabbing the values in L1P1 & L1P2 and subtracting G1P1 & G1P2 (AKA the baseline), then doing it again for L2P1, L2P2 and any other columns there might be (this is what my for loop does in the original function).
L1P1 L1P2 L2P1 L2P2 BP1 BP2
0 9.0 1.0 11.0 3.0 1.0 10.0
1 19.0 11.0 21.0 13.0 1.0 10.0
2 29.0 21.0 31.0 23.0 1.0 10.0
3 39.0 31.0 41.0 33.0 1.0 10.0
4 49.0 41.0 51.0 43.0 1.0 10.0
5 59.0 51.0 61.0 53.0 1.0 10.0
Note that labels for the dataframe should not change, and ideally I'd want a method that relies on the columns indexes, not labels, because the actual data block is 30 columns, not 2 like in this example. This is how my original function in numpy worked, the parameters baseline_columns and features_columns were just lists of the columns indexes.
After this the baseline columns would be deleted all together from the dataframe, as their function has already been fulfilled.
I tried doing this for just 1 batch using iloc but I get Nan values
df.iloc[:,[0,1]] = df.iloc[:,[0,1]] - df.iloc[:,[4,5]]
L1P1 L1P2 L2P1 L2P2 G1P1 G1P2
0 NaN NaN 12.0 13.0 1.0 10.0
1 NaN NaN 22.0 23.0 1.0 10.0
2 NaN NaN 32.0 33.0 1.0 10.0
3 NaN NaN 42.0 43.0 1.0 10.0
4 NaN NaN 52.0 53.0 1.0 10.0
5 NaN NaN 62.0 63.0 1.0 10.0
python pandas numpy
add a comment |
I am transitioning to using pandas for handling my csv datasets. I am currently trying to do in pandas what I was already doing very easily in numpy: subtract a group of columns from another group several times. This is effectively a element-wise matrix subtraction.
Just for reference, this used to be my numpy solution for this
def subtract_baseline(data, baseline_columns, features_columns):
"""Takes in a list of baseline columns and feature columns, and subtracts the baseline values from all features"""
assert len(features_columns)%len(baseline_columns)==0, "The number of feature columns is not divisible by baseline columns"
num_blocks = len(features_columns)/len(baseline_columns)
block_size = len(baseline_columns)
for i in range(num_blocks):
#Grab each feature block and subract the baseline
init_col = block_size*i+features_columns[0]
final_col = init_col+block_size
data[:, init_col:final_col] = numpy.subtract(data[:, init_col:final_col], data[:,baseline_columns])
return data
To ilustrate better, we can create the following toy dataset:
data = [[10,11,12,13,1,10],[20,21,22,23,1,10],[30,31,32,33,1,10],[40,41,42,43,1,10],[50,51,52,53,1,10],[60,61,62,63,1,10]]
df = pd.DataFrame(data,columns=['L1P1','L1P2','L2P1','L2P2','BP1','BP2'],dtype=float)
L1P1 L1P2 L2P1 L2P2 BP1 BP2
0 10.0 11.0 12.0 13.0 1.0 10.0
1 20.0 21.0 22.0 23.0 1.0 10.0
2 30.0 31.0 32.0 33.0 1.0 10.0
3 40.0 41.0 42.0 43.0 1.0 10.0
4 50.0 51.0 52.0 53.0 1.0 10.0
5 60.0 61.0 62.0 63.0 1.0 10.0
The correct output would be the result of grabbing the values in L1P1 & L1P2 and subtracting G1P1 & G1P2 (AKA the baseline), then doing it again for L2P1, L2P2 and any other columns there might be (this is what my for loop does in the original function).
L1P1 L1P2 L2P1 L2P2 BP1 BP2
0 9.0 1.0 11.0 3.0 1.0 10.0
1 19.0 11.0 21.0 13.0 1.0 10.0
2 29.0 21.0 31.0 23.0 1.0 10.0
3 39.0 31.0 41.0 33.0 1.0 10.0
4 49.0 41.0 51.0 43.0 1.0 10.0
5 59.0 51.0 61.0 53.0 1.0 10.0
Note that labels for the dataframe should not change, and ideally I'd want a method that relies on the columns indexes, not labels, because the actual data block is 30 columns, not 2 like in this example. This is how my original function in numpy worked, the parameters baseline_columns and features_columns were just lists of the columns indexes.
After this the baseline columns would be deleted all together from the dataframe, as their function has already been fulfilled.
I tried doing this for just 1 batch using iloc but I get Nan values
df.iloc[:,[0,1]] = df.iloc[:,[0,1]] - df.iloc[:,[4,5]]
L1P1 L1P2 L2P1 L2P2 G1P1 G1P2
0 NaN NaN 12.0 13.0 1.0 10.0
1 NaN NaN 22.0 23.0 1.0 10.0
2 NaN NaN 32.0 33.0 1.0 10.0
3 NaN NaN 42.0 43.0 1.0 10.0
4 NaN NaN 52.0 53.0 1.0 10.0
5 NaN NaN 62.0 63.0 1.0 10.0
python pandas numpy
add a comment |
I am transitioning to using pandas for handling my csv datasets. I am currently trying to do in pandas what I was already doing very easily in numpy: subtract a group of columns from another group several times. This is effectively a element-wise matrix subtraction.
Just for reference, this used to be my numpy solution for this
def subtract_baseline(data, baseline_columns, features_columns):
"""Takes in a list of baseline columns and feature columns, and subtracts the baseline values from all features"""
assert len(features_columns)%len(baseline_columns)==0, "The number of feature columns is not divisible by baseline columns"
num_blocks = len(features_columns)/len(baseline_columns)
block_size = len(baseline_columns)
for i in range(num_blocks):
#Grab each feature block and subract the baseline
init_col = block_size*i+features_columns[0]
final_col = init_col+block_size
data[:, init_col:final_col] = numpy.subtract(data[:, init_col:final_col], data[:,baseline_columns])
return data
To ilustrate better, we can create the following toy dataset:
data = [[10,11,12,13,1,10],[20,21,22,23,1,10],[30,31,32,33,1,10],[40,41,42,43,1,10],[50,51,52,53,1,10],[60,61,62,63,1,10]]
df = pd.DataFrame(data,columns=['L1P1','L1P2','L2P1','L2P2','BP1','BP2'],dtype=float)
L1P1 L1P2 L2P1 L2P2 BP1 BP2
0 10.0 11.0 12.0 13.0 1.0 10.0
1 20.0 21.0 22.0 23.0 1.0 10.0
2 30.0 31.0 32.0 33.0 1.0 10.0
3 40.0 41.0 42.0 43.0 1.0 10.0
4 50.0 51.0 52.0 53.0 1.0 10.0
5 60.0 61.0 62.0 63.0 1.0 10.0
The correct output would be the result of grabbing the values in L1P1 & L1P2 and subtracting G1P1 & G1P2 (AKA the baseline), then doing it again for L2P1, L2P2 and any other columns there might be (this is what my for loop does in the original function).
L1P1 L1P2 L2P1 L2P2 BP1 BP2
0 9.0 1.0 11.0 3.0 1.0 10.0
1 19.0 11.0 21.0 13.0 1.0 10.0
2 29.0 21.0 31.0 23.0 1.0 10.0
3 39.0 31.0 41.0 33.0 1.0 10.0
4 49.0 41.0 51.0 43.0 1.0 10.0
5 59.0 51.0 61.0 53.0 1.0 10.0
Note that labels for the dataframe should not change, and ideally I'd want a method that relies on the columns indexes, not labels, because the actual data block is 30 columns, not 2 like in this example. This is how my original function in numpy worked, the parameters baseline_columns and features_columns were just lists of the columns indexes.
After this the baseline columns would be deleted all together from the dataframe, as their function has already been fulfilled.
I tried doing this for just 1 batch using iloc but I get Nan values
df.iloc[:,[0,1]] = df.iloc[:,[0,1]] - df.iloc[:,[4,5]]
L1P1 L1P2 L2P1 L2P2 G1P1 G1P2
0 NaN NaN 12.0 13.0 1.0 10.0
1 NaN NaN 22.0 23.0 1.0 10.0
2 NaN NaN 32.0 33.0 1.0 10.0
3 NaN NaN 42.0 43.0 1.0 10.0
4 NaN NaN 52.0 53.0 1.0 10.0
5 NaN NaN 62.0 63.0 1.0 10.0
python pandas numpy
I am transitioning to using pandas for handling my csv datasets. I am currently trying to do in pandas what I was already doing very easily in numpy: subtract a group of columns from another group several times. This is effectively a element-wise matrix subtraction.
Just for reference, this used to be my numpy solution for this
def subtract_baseline(data, baseline_columns, features_columns):
"""Takes in a list of baseline columns and feature columns, and subtracts the baseline values from all features"""
assert len(features_columns)%len(baseline_columns)==0, "The number of feature columns is not divisible by baseline columns"
num_blocks = len(features_columns)/len(baseline_columns)
block_size = len(baseline_columns)
for i in range(num_blocks):
#Grab each feature block and subract the baseline
init_col = block_size*i+features_columns[0]
final_col = init_col+block_size
data[:, init_col:final_col] = numpy.subtract(data[:, init_col:final_col], data[:,baseline_columns])
return data
To ilustrate better, we can create the following toy dataset:
data = [[10,11,12,13,1,10],[20,21,22,23,1,10],[30,31,32,33,1,10],[40,41,42,43,1,10],[50,51,52,53,1,10],[60,61,62,63,1,10]]
df = pd.DataFrame(data,columns=['L1P1','L1P2','L2P1','L2P2','BP1','BP2'],dtype=float)
L1P1 L1P2 L2P1 L2P2 BP1 BP2
0 10.0 11.0 12.0 13.0 1.0 10.0
1 20.0 21.0 22.0 23.0 1.0 10.0
2 30.0 31.0 32.0 33.0 1.0 10.0
3 40.0 41.0 42.0 43.0 1.0 10.0
4 50.0 51.0 52.0 53.0 1.0 10.0
5 60.0 61.0 62.0 63.0 1.0 10.0
The correct output would be the result of grabbing the values in L1P1 & L1P2 and subtracting G1P1 & G1P2 (AKA the baseline), then doing it again for L2P1, L2P2 and any other columns there might be (this is what my for loop does in the original function).
L1P1 L1P2 L2P1 L2P2 BP1 BP2
0 9.0 1.0 11.0 3.0 1.0 10.0
1 19.0 11.0 21.0 13.0 1.0 10.0
2 29.0 21.0 31.0 23.0 1.0 10.0
3 39.0 31.0 41.0 33.0 1.0 10.0
4 49.0 41.0 51.0 43.0 1.0 10.0
5 59.0 51.0 61.0 53.0 1.0 10.0
Note that labels for the dataframe should not change, and ideally I'd want a method that relies on the columns indexes, not labels, because the actual data block is 30 columns, not 2 like in this example. This is how my original function in numpy worked, the parameters baseline_columns and features_columns were just lists of the columns indexes.
After this the baseline columns would be deleted all together from the dataframe, as their function has already been fulfilled.
I tried doing this for just 1 batch using iloc but I get Nan values
df.iloc[:,[0,1]] = df.iloc[:,[0,1]] - df.iloc[:,[4,5]]
L1P1 L1P2 L2P1 L2P2 G1P1 G1P2
0 NaN NaN 12.0 13.0 1.0 10.0
1 NaN NaN 22.0 23.0 1.0 10.0
2 NaN NaN 32.0 33.0 1.0 10.0
3 NaN NaN 42.0 43.0 1.0 10.0
4 NaN NaN 52.0 53.0 1.0 10.0
5 NaN NaN 62.0 63.0 1.0 10.0
python pandas numpy
python pandas numpy
edited Nov 20 '18 at 22:34
Pedro Piacenza
asked Nov 20 '18 at 22:27
Pedro PiacenzaPedro Piacenza
32
32
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
Adding .values
at the end , pandas dataframe will search the column and index match to do the subtract , since the column is not match for 0,1 and 4,5 it will return NaN
df.iloc[:,[0,1]]=df.iloc[:,[0,1]].values - df.iloc[:,[4,5]].values
df
Out[176]:
L1P1 L1P2 L2P1 L2P2 BP1 BP2
0 9.0 1.0 12.0 13.0 1.0 10.0
1 19.0 11.0 22.0 23.0 1.0 10.0
2 29.0 21.0 32.0 33.0 1.0 10.0
3 39.0 31.0 42.0 43.0 1.0 10.0
4 49.0 41.0 52.0 53.0 1.0 10.0
5 59.0 51.0 62.0 63.0 1.0 10.0
1
Worked like I needed once I replaced the hardcoded columns to ranges! thanks!
– Pedro Piacenza
Nov 22 '18 at 19:36
add a comment |
Is there a reason you want to do it in one line? I.e. would it be okay for your purposes to do it with two lines:
df.iloc[:,0] = df.iloc[:,0] - df.iloc[:,4]
df.iloc[:,1] = df.iloc[:,1] - df.iloc[:,5]
These two lines achieve what I think is your intent.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53402527%2fsubtract-a-batch-of-columns-in-pandas%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
Adding .values
at the end , pandas dataframe will search the column and index match to do the subtract , since the column is not match for 0,1 and 4,5 it will return NaN
df.iloc[:,[0,1]]=df.iloc[:,[0,1]].values - df.iloc[:,[4,5]].values
df
Out[176]:
L1P1 L1P2 L2P1 L2P2 BP1 BP2
0 9.0 1.0 12.0 13.0 1.0 10.0
1 19.0 11.0 22.0 23.0 1.0 10.0
2 29.0 21.0 32.0 33.0 1.0 10.0
3 39.0 31.0 42.0 43.0 1.0 10.0
4 49.0 41.0 52.0 53.0 1.0 10.0
5 59.0 51.0 62.0 63.0 1.0 10.0
1
Worked like I needed once I replaced the hardcoded columns to ranges! thanks!
– Pedro Piacenza
Nov 22 '18 at 19:36
add a comment |
Adding .values
at the end , pandas dataframe will search the column and index match to do the subtract , since the column is not match for 0,1 and 4,5 it will return NaN
df.iloc[:,[0,1]]=df.iloc[:,[0,1]].values - df.iloc[:,[4,5]].values
df
Out[176]:
L1P1 L1P2 L2P1 L2P2 BP1 BP2
0 9.0 1.0 12.0 13.0 1.0 10.0
1 19.0 11.0 22.0 23.0 1.0 10.0
2 29.0 21.0 32.0 33.0 1.0 10.0
3 39.0 31.0 42.0 43.0 1.0 10.0
4 49.0 41.0 52.0 53.0 1.0 10.0
5 59.0 51.0 62.0 63.0 1.0 10.0
1
Worked like I needed once I replaced the hardcoded columns to ranges! thanks!
– Pedro Piacenza
Nov 22 '18 at 19:36
add a comment |
Adding .values
at the end , pandas dataframe will search the column and index match to do the subtract , since the column is not match for 0,1 and 4,5 it will return NaN
df.iloc[:,[0,1]]=df.iloc[:,[0,1]].values - df.iloc[:,[4,5]].values
df
Out[176]:
L1P1 L1P2 L2P1 L2P2 BP1 BP2
0 9.0 1.0 12.0 13.0 1.0 10.0
1 19.0 11.0 22.0 23.0 1.0 10.0
2 29.0 21.0 32.0 33.0 1.0 10.0
3 39.0 31.0 42.0 43.0 1.0 10.0
4 49.0 41.0 52.0 53.0 1.0 10.0
5 59.0 51.0 62.0 63.0 1.0 10.0
Adding .values
at the end , pandas dataframe will search the column and index match to do the subtract , since the column is not match for 0,1 and 4,5 it will return NaN
df.iloc[:,[0,1]]=df.iloc[:,[0,1]].values - df.iloc[:,[4,5]].values
df
Out[176]:
L1P1 L1P2 L2P1 L2P2 BP1 BP2
0 9.0 1.0 12.0 13.0 1.0 10.0
1 19.0 11.0 22.0 23.0 1.0 10.0
2 29.0 21.0 32.0 33.0 1.0 10.0
3 39.0 31.0 42.0 43.0 1.0 10.0
4 49.0 41.0 52.0 53.0 1.0 10.0
5 59.0 51.0 62.0 63.0 1.0 10.0
answered Nov 20 '18 at 23:23


Wen-BenWen-Ben
109k83265
109k83265
1
Worked like I needed once I replaced the hardcoded columns to ranges! thanks!
– Pedro Piacenza
Nov 22 '18 at 19:36
add a comment |
1
Worked like I needed once I replaced the hardcoded columns to ranges! thanks!
– Pedro Piacenza
Nov 22 '18 at 19:36
1
1
Worked like I needed once I replaced the hardcoded columns to ranges! thanks!
– Pedro Piacenza
Nov 22 '18 at 19:36
Worked like I needed once I replaced the hardcoded columns to ranges! thanks!
– Pedro Piacenza
Nov 22 '18 at 19:36
add a comment |
Is there a reason you want to do it in one line? I.e. would it be okay for your purposes to do it with two lines:
df.iloc[:,0] = df.iloc[:,0] - df.iloc[:,4]
df.iloc[:,1] = df.iloc[:,1] - df.iloc[:,5]
These two lines achieve what I think is your intent.
add a comment |
Is there a reason you want to do it in one line? I.e. would it be okay for your purposes to do it with two lines:
df.iloc[:,0] = df.iloc[:,0] - df.iloc[:,4]
df.iloc[:,1] = df.iloc[:,1] - df.iloc[:,5]
These two lines achieve what I think is your intent.
add a comment |
Is there a reason you want to do it in one line? I.e. would it be okay for your purposes to do it with two lines:
df.iloc[:,0] = df.iloc[:,0] - df.iloc[:,4]
df.iloc[:,1] = df.iloc[:,1] - df.iloc[:,5]
These two lines achieve what I think is your intent.
Is there a reason you want to do it in one line? I.e. would it be okay for your purposes to do it with two lines:
df.iloc[:,0] = df.iloc[:,0] - df.iloc[:,4]
df.iloc[:,1] = df.iloc[:,1] - df.iloc[:,5]
These two lines achieve what I think is your intent.
edited Nov 20 '18 at 23:19
answered Nov 20 '18 at 23:09
skumhestskumhest
54
54
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53402527%2fsubtract-a-batch-of-columns-in-pandas%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown