How can I assign values to a DataFrame based on column-to-value mapping?
I have a DataFrame that looks like this
df = pd.DataFrame({
"A": ['Y0', 'Y2', 'Y5', 'Y1', 'Y3', 'Y5'],
"B": ['Y2', 'Y5', 'Y0', 'Y2', 'Y7', 'Y6'],
"C": ['Y7', 'Y1', 'Y4', 'Y2', 'Y5', 'Y0'],
"D": ['Y2', 'Y5', 'Y7', 'Y7', 'Y1', 'Y5'],
"E": ['Y6', 'Y3', 'Y1', 'Y6', 'Y1', 'Y0'],
"F": ['Y0', 'Y5', 'Y2', 'Y0', 'Y0', 'Y0'],
"X0": [100, 200, 300, 400, 500, 600],
"X1": [101, 201, 301, 401, 501, 601],
"X2": [102, 202, 302, 402, 502, 602],
"X3": [103, 203, 303, 403, 503, 603],
"X4": [104, 204, 304, 404, 504, 604],
"X5": [105, 205, 305, 405, 505, 605],
"X6": [106, 206, 306, 406, 506, 606],
"X7": [107, 207, 307, 407, 507, 607]
})
df
A B C D E F X0 X1 X2 X3 X4 X5 X6 X7
0 Y0 Y2 Y7 Y2 Y6 Y0 100 101 102 103 104 105 106 107
1 Y2 Y5 Y1 Y5 Y3 Y5 200 201 202 203 204 205 206 207
2 Y5 Y0 Y4 Y7 Y1 Y2 300 301 302 303 304 305 306 307
3 Y1 Y2 Y2 Y7 Y6 Y0 400 401 402 403 404 405 406 407
4 Y3 Y7 Y5 Y1 Y1 Y0 500 501 502 503 504 505 506 507
5 Y5 Y6 Y0 Y5 Y0 Y0 600 601 602 603 604 605 606 607
and a mapping
mapping = {
'Y0': 'X0',
'Y1': 'X1',
'Y2': 'X2',
'Y3': 'X3',
'Y4': 'X4',
'Y5': 'X5',
'Y6': 'X6',
'Y7': 'X7',
}
In reality, the mapping between the Xs and the Ys is not trivial (but it can be put into code).
I am trying to assign new columns, A_result
through F_result
, whose values come from columns X0
through X7
using the values in A
through F
as a key.
For example, the second element of C_result
should be 201
because the second element of C
is Y1
and the second element of X1
has a value of 201
.
The code I have to do this now is
for col in ['A', 'B', 'C', 'D', 'E', 'F']:
col_result = col + '_result'
df[col_result] = 0
for k, v in mapping.items():
df.loc[df[col] == k, col_result] = df[v]
df.filter(regex='_result', axis=1)
A_result B_result C_result D_result E_result F_result
0 100 102 107 102 106 100
1 202 205 201 205 203 205
2 305 300 304 307 301 302
3 401 402 402 407 406 400
4 503 507 505 501 501 500
5 605 606 600 605 600 600
This works, but it's pretty slow. On a DataFrame with a few hundred rows, it takes a little less than a second to run. What's a faster way to do this?
python pandas dataframe
add a comment |
I have a DataFrame that looks like this
df = pd.DataFrame({
"A": ['Y0', 'Y2', 'Y5', 'Y1', 'Y3', 'Y5'],
"B": ['Y2', 'Y5', 'Y0', 'Y2', 'Y7', 'Y6'],
"C": ['Y7', 'Y1', 'Y4', 'Y2', 'Y5', 'Y0'],
"D": ['Y2', 'Y5', 'Y7', 'Y7', 'Y1', 'Y5'],
"E": ['Y6', 'Y3', 'Y1', 'Y6', 'Y1', 'Y0'],
"F": ['Y0', 'Y5', 'Y2', 'Y0', 'Y0', 'Y0'],
"X0": [100, 200, 300, 400, 500, 600],
"X1": [101, 201, 301, 401, 501, 601],
"X2": [102, 202, 302, 402, 502, 602],
"X3": [103, 203, 303, 403, 503, 603],
"X4": [104, 204, 304, 404, 504, 604],
"X5": [105, 205, 305, 405, 505, 605],
"X6": [106, 206, 306, 406, 506, 606],
"X7": [107, 207, 307, 407, 507, 607]
})
df
A B C D E F X0 X1 X2 X3 X4 X5 X6 X7
0 Y0 Y2 Y7 Y2 Y6 Y0 100 101 102 103 104 105 106 107
1 Y2 Y5 Y1 Y5 Y3 Y5 200 201 202 203 204 205 206 207
2 Y5 Y0 Y4 Y7 Y1 Y2 300 301 302 303 304 305 306 307
3 Y1 Y2 Y2 Y7 Y6 Y0 400 401 402 403 404 405 406 407
4 Y3 Y7 Y5 Y1 Y1 Y0 500 501 502 503 504 505 506 507
5 Y5 Y6 Y0 Y5 Y0 Y0 600 601 602 603 604 605 606 607
and a mapping
mapping = {
'Y0': 'X0',
'Y1': 'X1',
'Y2': 'X2',
'Y3': 'X3',
'Y4': 'X4',
'Y5': 'X5',
'Y6': 'X6',
'Y7': 'X7',
}
In reality, the mapping between the Xs and the Ys is not trivial (but it can be put into code).
I am trying to assign new columns, A_result
through F_result
, whose values come from columns X0
through X7
using the values in A
through F
as a key.
For example, the second element of C_result
should be 201
because the second element of C
is Y1
and the second element of X1
has a value of 201
.
The code I have to do this now is
for col in ['A', 'B', 'C', 'D', 'E', 'F']:
col_result = col + '_result'
df[col_result] = 0
for k, v in mapping.items():
df.loc[df[col] == k, col_result] = df[v]
df.filter(regex='_result', axis=1)
A_result B_result C_result D_result E_result F_result
0 100 102 107 102 106 100
1 202 205 201 205 203 205
2 305 300 304 307 301 302
3 401 402 402 407 406 400
4 503 507 505 501 501 500
5 605 606 600 605 600 600
This works, but it's pretty slow. On a DataFrame with a few hundred rows, it takes a little less than a second to run. What's a faster way to do this?
python pandas dataframe
add a comment |
I have a DataFrame that looks like this
df = pd.DataFrame({
"A": ['Y0', 'Y2', 'Y5', 'Y1', 'Y3', 'Y5'],
"B": ['Y2', 'Y5', 'Y0', 'Y2', 'Y7', 'Y6'],
"C": ['Y7', 'Y1', 'Y4', 'Y2', 'Y5', 'Y0'],
"D": ['Y2', 'Y5', 'Y7', 'Y7', 'Y1', 'Y5'],
"E": ['Y6', 'Y3', 'Y1', 'Y6', 'Y1', 'Y0'],
"F": ['Y0', 'Y5', 'Y2', 'Y0', 'Y0', 'Y0'],
"X0": [100, 200, 300, 400, 500, 600],
"X1": [101, 201, 301, 401, 501, 601],
"X2": [102, 202, 302, 402, 502, 602],
"X3": [103, 203, 303, 403, 503, 603],
"X4": [104, 204, 304, 404, 504, 604],
"X5": [105, 205, 305, 405, 505, 605],
"X6": [106, 206, 306, 406, 506, 606],
"X7": [107, 207, 307, 407, 507, 607]
})
df
A B C D E F X0 X1 X2 X3 X4 X5 X6 X7
0 Y0 Y2 Y7 Y2 Y6 Y0 100 101 102 103 104 105 106 107
1 Y2 Y5 Y1 Y5 Y3 Y5 200 201 202 203 204 205 206 207
2 Y5 Y0 Y4 Y7 Y1 Y2 300 301 302 303 304 305 306 307
3 Y1 Y2 Y2 Y7 Y6 Y0 400 401 402 403 404 405 406 407
4 Y3 Y7 Y5 Y1 Y1 Y0 500 501 502 503 504 505 506 507
5 Y5 Y6 Y0 Y5 Y0 Y0 600 601 602 603 604 605 606 607
and a mapping
mapping = {
'Y0': 'X0',
'Y1': 'X1',
'Y2': 'X2',
'Y3': 'X3',
'Y4': 'X4',
'Y5': 'X5',
'Y6': 'X6',
'Y7': 'X7',
}
In reality, the mapping between the Xs and the Ys is not trivial (but it can be put into code).
I am trying to assign new columns, A_result
through F_result
, whose values come from columns X0
through X7
using the values in A
through F
as a key.
For example, the second element of C_result
should be 201
because the second element of C
is Y1
and the second element of X1
has a value of 201
.
The code I have to do this now is
for col in ['A', 'B', 'C', 'D', 'E', 'F']:
col_result = col + '_result'
df[col_result] = 0
for k, v in mapping.items():
df.loc[df[col] == k, col_result] = df[v]
df.filter(regex='_result', axis=1)
A_result B_result C_result D_result E_result F_result
0 100 102 107 102 106 100
1 202 205 201 205 203 205
2 305 300 304 307 301 302
3 401 402 402 407 406 400
4 503 507 505 501 501 500
5 605 606 600 605 600 600
This works, but it's pretty slow. On a DataFrame with a few hundred rows, it takes a little less than a second to run. What's a faster way to do this?
python pandas dataframe
I have a DataFrame that looks like this
df = pd.DataFrame({
"A": ['Y0', 'Y2', 'Y5', 'Y1', 'Y3', 'Y5'],
"B": ['Y2', 'Y5', 'Y0', 'Y2', 'Y7', 'Y6'],
"C": ['Y7', 'Y1', 'Y4', 'Y2', 'Y5', 'Y0'],
"D": ['Y2', 'Y5', 'Y7', 'Y7', 'Y1', 'Y5'],
"E": ['Y6', 'Y3', 'Y1', 'Y6', 'Y1', 'Y0'],
"F": ['Y0', 'Y5', 'Y2', 'Y0', 'Y0', 'Y0'],
"X0": [100, 200, 300, 400, 500, 600],
"X1": [101, 201, 301, 401, 501, 601],
"X2": [102, 202, 302, 402, 502, 602],
"X3": [103, 203, 303, 403, 503, 603],
"X4": [104, 204, 304, 404, 504, 604],
"X5": [105, 205, 305, 405, 505, 605],
"X6": [106, 206, 306, 406, 506, 606],
"X7": [107, 207, 307, 407, 507, 607]
})
df
A B C D E F X0 X1 X2 X3 X4 X5 X6 X7
0 Y0 Y2 Y7 Y2 Y6 Y0 100 101 102 103 104 105 106 107
1 Y2 Y5 Y1 Y5 Y3 Y5 200 201 202 203 204 205 206 207
2 Y5 Y0 Y4 Y7 Y1 Y2 300 301 302 303 304 305 306 307
3 Y1 Y2 Y2 Y7 Y6 Y0 400 401 402 403 404 405 406 407
4 Y3 Y7 Y5 Y1 Y1 Y0 500 501 502 503 504 505 506 507
5 Y5 Y6 Y0 Y5 Y0 Y0 600 601 602 603 604 605 606 607
and a mapping
mapping = {
'Y0': 'X0',
'Y1': 'X1',
'Y2': 'X2',
'Y3': 'X3',
'Y4': 'X4',
'Y5': 'X5',
'Y6': 'X6',
'Y7': 'X7',
}
In reality, the mapping between the Xs and the Ys is not trivial (but it can be put into code).
I am trying to assign new columns, A_result
through F_result
, whose values come from columns X0
through X7
using the values in A
through F
as a key.
For example, the second element of C_result
should be 201
because the second element of C
is Y1
and the second element of X1
has a value of 201
.
The code I have to do this now is
for col in ['A', 'B', 'C', 'D', 'E', 'F']:
col_result = col + '_result'
df[col_result] = 0
for k, v in mapping.items():
df.loc[df[col] == k, col_result] = df[v]
df.filter(regex='_result', axis=1)
A_result B_result C_result D_result E_result F_result
0 100 102 107 102 106 100
1 202 205 201 205 203 205
2 305 300 304 307 301 302
3 401 402 402 407 406 400
4 503 507 505 501 501 500
5 605 606 600 605 600 600
This works, but it's pretty slow. On a DataFrame with a few hundred rows, it takes a little less than a second to run. What's a faster way to do this?
python pandas dataframe
python pandas dataframe
asked Nov 20 '18 at 17:54
Kris HarperKris Harper
2,98652864
2,98652864
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
Optimized a bit now, you can try this:
df2 = df.iloc[:,:6].apply(lambda x: [df[mapping.get(elem)][i] for i, elem in enumerate(x)]).rename(columns=lambda c: f'{c}_results')
# A_results B_results C_results D_results E_results F_results
# 0 100 102 107 102 106 100
# 1 202 205 201 205 203 205
# 2 305 300 304 307 301 302
# 3 401 402 402 407 406 400
# 4 503 507 505 501 501 500
# 5 605 606 600 605 600 600
Test speed on n=1000
:
this method = 6.141038116
your method = 96.079351477
Worth noting that on single runs the performance doesn't seem to vary that much though.
Apologies, I mistimed, this in indeed much faster. +1 !
– jpp
Nov 20 '18 at 19:01
@jpp I just edited the answer. The original answer did seem a bit slow, seems the 2 separate operation (apply/column) took a toll on the run time. Doing it all in one go seems to be an improvement. Weird that when I run it individually the time are comparable (~2s) but when running in bulk (n=1000
) the difference is quite noticeable.
– Idlehands
Nov 20 '18 at 19:02
Awesome, this cut the execution time in half.
– Kris Harper
Nov 22 '18 at 19:33
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53398806%2fhow-can-i-assign-values-to-a-dataframe-based-on-column-to-value-mapping%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Optimized a bit now, you can try this:
df2 = df.iloc[:,:6].apply(lambda x: [df[mapping.get(elem)][i] for i, elem in enumerate(x)]).rename(columns=lambda c: f'{c}_results')
# A_results B_results C_results D_results E_results F_results
# 0 100 102 107 102 106 100
# 1 202 205 201 205 203 205
# 2 305 300 304 307 301 302
# 3 401 402 402 407 406 400
# 4 503 507 505 501 501 500
# 5 605 606 600 605 600 600
Test speed on n=1000
:
this method = 6.141038116
your method = 96.079351477
Worth noting that on single runs the performance doesn't seem to vary that much though.
Apologies, I mistimed, this in indeed much faster. +1 !
– jpp
Nov 20 '18 at 19:01
@jpp I just edited the answer. The original answer did seem a bit slow, seems the 2 separate operation (apply/column) took a toll on the run time. Doing it all in one go seems to be an improvement. Weird that when I run it individually the time are comparable (~2s) but when running in bulk (n=1000
) the difference is quite noticeable.
– Idlehands
Nov 20 '18 at 19:02
Awesome, this cut the execution time in half.
– Kris Harper
Nov 22 '18 at 19:33
add a comment |
Optimized a bit now, you can try this:
df2 = df.iloc[:,:6].apply(lambda x: [df[mapping.get(elem)][i] for i, elem in enumerate(x)]).rename(columns=lambda c: f'{c}_results')
# A_results B_results C_results D_results E_results F_results
# 0 100 102 107 102 106 100
# 1 202 205 201 205 203 205
# 2 305 300 304 307 301 302
# 3 401 402 402 407 406 400
# 4 503 507 505 501 501 500
# 5 605 606 600 605 600 600
Test speed on n=1000
:
this method = 6.141038116
your method = 96.079351477
Worth noting that on single runs the performance doesn't seem to vary that much though.
Apologies, I mistimed, this in indeed much faster. +1 !
– jpp
Nov 20 '18 at 19:01
@jpp I just edited the answer. The original answer did seem a bit slow, seems the 2 separate operation (apply/column) took a toll on the run time. Doing it all in one go seems to be an improvement. Weird that when I run it individually the time are comparable (~2s) but when running in bulk (n=1000
) the difference is quite noticeable.
– Idlehands
Nov 20 '18 at 19:02
Awesome, this cut the execution time in half.
– Kris Harper
Nov 22 '18 at 19:33
add a comment |
Optimized a bit now, you can try this:
df2 = df.iloc[:,:6].apply(lambda x: [df[mapping.get(elem)][i] for i, elem in enumerate(x)]).rename(columns=lambda c: f'{c}_results')
# A_results B_results C_results D_results E_results F_results
# 0 100 102 107 102 106 100
# 1 202 205 201 205 203 205
# 2 305 300 304 307 301 302
# 3 401 402 402 407 406 400
# 4 503 507 505 501 501 500
# 5 605 606 600 605 600 600
Test speed on n=1000
:
this method = 6.141038116
your method = 96.079351477
Worth noting that on single runs the performance doesn't seem to vary that much though.
Optimized a bit now, you can try this:
df2 = df.iloc[:,:6].apply(lambda x: [df[mapping.get(elem)][i] for i, elem in enumerate(x)]).rename(columns=lambda c: f'{c}_results')
# A_results B_results C_results D_results E_results F_results
# 0 100 102 107 102 106 100
# 1 202 205 201 205 203 205
# 2 305 300 304 307 301 302
# 3 401 402 402 407 406 400
# 4 503 507 505 501 501 500
# 5 605 606 600 605 600 600
Test speed on n=1000
:
this method = 6.141038116
your method = 96.079351477
Worth noting that on single runs the performance doesn't seem to vary that much though.
edited Nov 20 '18 at 19:09
answered Nov 20 '18 at 18:21


IdlehandsIdlehands
4,4551518
4,4551518
Apologies, I mistimed, this in indeed much faster. +1 !
– jpp
Nov 20 '18 at 19:01
@jpp I just edited the answer. The original answer did seem a bit slow, seems the 2 separate operation (apply/column) took a toll on the run time. Doing it all in one go seems to be an improvement. Weird that when I run it individually the time are comparable (~2s) but when running in bulk (n=1000
) the difference is quite noticeable.
– Idlehands
Nov 20 '18 at 19:02
Awesome, this cut the execution time in half.
– Kris Harper
Nov 22 '18 at 19:33
add a comment |
Apologies, I mistimed, this in indeed much faster. +1 !
– jpp
Nov 20 '18 at 19:01
@jpp I just edited the answer. The original answer did seem a bit slow, seems the 2 separate operation (apply/column) took a toll on the run time. Doing it all in one go seems to be an improvement. Weird that when I run it individually the time are comparable (~2s) but when running in bulk (n=1000
) the difference is quite noticeable.
– Idlehands
Nov 20 '18 at 19:02
Awesome, this cut the execution time in half.
– Kris Harper
Nov 22 '18 at 19:33
Apologies, I mistimed, this in indeed much faster. +1 !
– jpp
Nov 20 '18 at 19:01
Apologies, I mistimed, this in indeed much faster. +1 !
– jpp
Nov 20 '18 at 19:01
@jpp I just edited the answer. The original answer did seem a bit slow, seems the 2 separate operation (apply/column) took a toll on the run time. Doing it all in one go seems to be an improvement. Weird that when I run it individually the time are comparable (~2s) but when running in bulk (
n=1000
) the difference is quite noticeable.– Idlehands
Nov 20 '18 at 19:02
@jpp I just edited the answer. The original answer did seem a bit slow, seems the 2 separate operation (apply/column) took a toll on the run time. Doing it all in one go seems to be an improvement. Weird that when I run it individually the time are comparable (~2s) but when running in bulk (
n=1000
) the difference is quite noticeable.– Idlehands
Nov 20 '18 at 19:02
Awesome, this cut the execution time in half.
– Kris Harper
Nov 22 '18 at 19:33
Awesome, this cut the execution time in half.
– Kris Harper
Nov 22 '18 at 19:33
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53398806%2fhow-can-i-assign-values-to-a-dataframe-based-on-column-to-value-mapping%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown