How can I assign values to a DataFrame based on column-to-value mapping?

I have a DataFrame that looks like this

df = pd.DataFrame({

    "A": ['Y0', 'Y2', 'Y5', 'Y1', 'Y3', 'Y5'],

    "B": ['Y2', 'Y5', 'Y0', 'Y2', 'Y7', 'Y6'],

    "C": ['Y7', 'Y1', 'Y4', 'Y2', 'Y5', 'Y0'],

    "D": ['Y2', 'Y5', 'Y7', 'Y7', 'Y1', 'Y5'],

    "E": ['Y6', 'Y3', 'Y1', 'Y6', 'Y1', 'Y0'],

    "F": ['Y0', 'Y5', 'Y2', 'Y0', 'Y0', 'Y0'],

    "X0": [100, 200, 300, 400, 500, 600],

    "X1": [101, 201, 301, 401, 501, 601],

    "X2": [102, 202, 302, 402, 502, 602],

    "X3": [103, 203, 303, 403, 503, 603],

    "X4": [104, 204, 304, 404, 504, 604],

    "X5": [105, 205, 305, 405, 505, 605],

    "X6": [106, 206, 306, 406, 506, 606],

    "X7": [107, 207, 307, 407, 507, 607]

})



df



    A   B   C   D   E   F   X0  X1  X2  X3  X4  X5  X6  X7

0   Y0  Y2  Y7  Y2  Y6  Y0  100 101 102 103 104 105 106 107

1   Y2  Y5  Y1  Y5  Y3  Y5  200 201 202 203 204 205 206 207

2   Y5  Y0  Y4  Y7  Y1  Y2  300 301 302 303 304 305 306 307

3   Y1  Y2  Y2  Y7  Y6  Y0  400 401 402 403 404 405 406 407

4   Y3  Y7  Y5  Y1  Y1  Y0  500 501 502 503 504 505 506 507

5   Y5  Y6  Y0  Y5  Y0  Y0  600 601 602 603 604 605 606 607

and a mapping

mapping = {

    'Y0': 'X0',

    'Y1': 'X1',

    'Y2': 'X2',

    'Y3': 'X3',

    'Y4': 'X4',

    'Y5': 'X5',

    'Y6': 'X6',

    'Y7': 'X7',

}

In reality, the mapping between the Xs and the Ys is not trivial (but it can be put into code).

I am trying to assign new columns, A_result through F_result, whose values come from columns X0 through X7 using the values in A through F as a key.

For example, the second element of C_result should be 201 because the second element of C is Y1 and the second element of X1 has a value of 201.

The code I have to do this now is

for col in ['A', 'B', 'C', 'D', 'E', 'F']:

    col_result = col + '_result'

    df[col_result] = 0



    for k, v in mapping.items():

        df.loc[df[col] == k, col_result] = df[v]



df.filter(regex='_result', axis=1)



    A_result    B_result    C_result    D_result    E_result    F_result

0   100         102         107         102         106         100

1   202         205         201         205         203         205

2   305         300         304         307         301         302

3   401         402         402         407         406         400

4   503         507         505         501         501         500

5   605         606         600         605         600         600

This works, but it's pretty slow. On a DataFrame with a few hundred rows, it takes a little less than a second to run. What's a faster way to do this?

asked Nov 20 '18 at 17:54

Kris Harper

2,98652864

add a comment |

I have a DataFrame that looks like this

df = pd.DataFrame({

    "A": ['Y0', 'Y2', 'Y5', 'Y1', 'Y3', 'Y5'],

    "B": ['Y2', 'Y5', 'Y0', 'Y2', 'Y7', 'Y6'],

    "C": ['Y7', 'Y1', 'Y4', 'Y2', 'Y5', 'Y0'],

    "D": ['Y2', 'Y5', 'Y7', 'Y7', 'Y1', 'Y5'],

    "E": ['Y6', 'Y3', 'Y1', 'Y6', 'Y1', 'Y0'],

    "F": ['Y0', 'Y5', 'Y2', 'Y0', 'Y0', 'Y0'],

    "X0": [100, 200, 300, 400, 500, 600],

    "X1": [101, 201, 301, 401, 501, 601],

    "X2": [102, 202, 302, 402, 502, 602],

    "X3": [103, 203, 303, 403, 503, 603],

    "X4": [104, 204, 304, 404, 504, 604],

    "X5": [105, 205, 305, 405, 505, 605],

    "X6": [106, 206, 306, 406, 506, 606],

    "X7": [107, 207, 307, 407, 507, 607]

})



df



    A   B   C   D   E   F   X0  X1  X2  X3  X4  X5  X6  X7

0   Y0  Y2  Y7  Y2  Y6  Y0  100 101 102 103 104 105 106 107

1   Y2  Y5  Y1  Y5  Y3  Y5  200 201 202 203 204 205 206 207

2   Y5  Y0  Y4  Y7  Y1  Y2  300 301 302 303 304 305 306 307

3   Y1  Y2  Y2  Y7  Y6  Y0  400 401 402 403 404 405 406 407

4   Y3  Y7  Y5  Y1  Y1  Y0  500 501 502 503 504 505 506 507

5   Y5  Y6  Y0  Y5  Y0  Y0  600 601 602 603 604 605 606 607

and a mapping

mapping = {

    'Y0': 'X0',

    'Y1': 'X1',

    'Y2': 'X2',

    'Y3': 'X3',

    'Y4': 'X4',

    'Y5': 'X5',

    'Y6': 'X6',

    'Y7': 'X7',

}

In reality, the mapping between the Xs and the Ys is not trivial (but it can be put into code).

I am trying to assign new columns, A_result through F_result, whose values come from columns X0 through X7 using the values in A through F as a key.

For example, the second element of C_result should be 201 because the second element of C is Y1 and the second element of X1 has a value of 201.

The code I have to do this now is

for col in ['A', 'B', 'C', 'D', 'E', 'F']:

    col_result = col + '_result'

    df[col_result] = 0



    for k, v in mapping.items():

        df.loc[df[col] == k, col_result] = df[v]



df.filter(regex='_result', axis=1)



    A_result    B_result    C_result    D_result    E_result    F_result

0   100         102         107         102         106         100

1   202         205         201         205         203         205

2   305         300         304         307         301         302

3   401         402         402         407         406         400

4   503         507         505         501         501         500

5   605         606         600         605         600         600

This works, but it's pretty slow. On a DataFrame with a few hundred rows, it takes a little less than a second to run. What's a faster way to do this?

asked Nov 20 '18 at 17:54

Kris Harper

2,98652864

add a comment |

I have a DataFrame that looks like this

df = pd.DataFrame({

    "A": ['Y0', 'Y2', 'Y5', 'Y1', 'Y3', 'Y5'],

    "B": ['Y2', 'Y5', 'Y0', 'Y2', 'Y7', 'Y6'],

    "C": ['Y7', 'Y1', 'Y4', 'Y2', 'Y5', 'Y0'],

    "D": ['Y2', 'Y5', 'Y7', 'Y7', 'Y1', 'Y5'],

    "E": ['Y6', 'Y3', 'Y1', 'Y6', 'Y1', 'Y0'],

    "F": ['Y0', 'Y5', 'Y2', 'Y0', 'Y0', 'Y0'],

    "X0": [100, 200, 300, 400, 500, 600],

    "X1": [101, 201, 301, 401, 501, 601],

    "X2": [102, 202, 302, 402, 502, 602],

    "X3": [103, 203, 303, 403, 503, 603],

    "X4": [104, 204, 304, 404, 504, 604],

    "X5": [105, 205, 305, 405, 505, 605],

    "X6": [106, 206, 306, 406, 506, 606],

    "X7": [107, 207, 307, 407, 507, 607]

})



df



    A   B   C   D   E   F   X0  X1  X2  X3  X4  X5  X6  X7

0   Y0  Y2  Y7  Y2  Y6  Y0  100 101 102 103 104 105 106 107

1   Y2  Y5  Y1  Y5  Y3  Y5  200 201 202 203 204 205 206 207

2   Y5  Y0  Y4  Y7  Y1  Y2  300 301 302 303 304 305 306 307

3   Y1  Y2  Y2  Y7  Y6  Y0  400 401 402 403 404 405 406 407

4   Y3  Y7  Y5  Y1  Y1  Y0  500 501 502 503 504 505 506 507

5   Y5  Y6  Y0  Y5  Y0  Y0  600 601 602 603 604 605 606 607

and a mapping

mapping = {

    'Y0': 'X0',

    'Y1': 'X1',

    'Y2': 'X2',

    'Y3': 'X3',

    'Y4': 'X4',

    'Y5': 'X5',

    'Y6': 'X6',

    'Y7': 'X7',

}

In reality, the mapping between the Xs and the Ys is not trivial (but it can be put into code).

I am trying to assign new columns, A_result through F_result, whose values come from columns X0 through X7 using the values in A through F as a key.

For example, the second element of C_result should be 201 because the second element of C is Y1 and the second element of X1 has a value of 201.

The code I have to do this now is

for col in ['A', 'B', 'C', 'D', 'E', 'F']:

    col_result = col + '_result'

    df[col_result] = 0



    for k, v in mapping.items():

        df.loc[df[col] == k, col_result] = df[v]



df.filter(regex='_result', axis=1)



    A_result    B_result    C_result    D_result    E_result    F_result

0   100         102         107         102         106         100

1   202         205         201         205         203         205

2   305         300         304         307         301         302

3   401         402         402         407         406         400

4   503         507         505         501         501         500

5   605         606         600         605         600         600

This works, but it's pretty slow. On a DataFrame with a few hundred rows, it takes a little less than a second to run. What's a faster way to do this?

asked Nov 20 '18 at 17:54

Kris Harper

2,98652864

I have a DataFrame that looks like this

df = pd.DataFrame({

    "A": ['Y0', 'Y2', 'Y5', 'Y1', 'Y3', 'Y5'],

    "B": ['Y2', 'Y5', 'Y0', 'Y2', 'Y7', 'Y6'],

    "C": ['Y7', 'Y1', 'Y4', 'Y2', 'Y5', 'Y0'],

    "D": ['Y2', 'Y5', 'Y7', 'Y7', 'Y1', 'Y5'],

    "E": ['Y6', 'Y3', 'Y1', 'Y6', 'Y1', 'Y0'],

    "F": ['Y0', 'Y5', 'Y2', 'Y0', 'Y0', 'Y0'],

    "X0": [100, 200, 300, 400, 500, 600],

    "X1": [101, 201, 301, 401, 501, 601],

    "X2": [102, 202, 302, 402, 502, 602],

    "X3": [103, 203, 303, 403, 503, 603],

    "X4": [104, 204, 304, 404, 504, 604],

    "X5": [105, 205, 305, 405, 505, 605],

    "X6": [106, 206, 306, 406, 506, 606],

    "X7": [107, 207, 307, 407, 507, 607]

})



df



    A   B   C   D   E   F   X0  X1  X2  X3  X4  X5  X6  X7

0   Y0  Y2  Y7  Y2  Y6  Y0  100 101 102 103 104 105 106 107

1   Y2  Y5  Y1  Y5  Y3  Y5  200 201 202 203 204 205 206 207

2   Y5  Y0  Y4  Y7  Y1  Y2  300 301 302 303 304 305 306 307

3   Y1  Y2  Y2  Y7  Y6  Y0  400 401 402 403 404 405 406 407

4   Y3  Y7  Y5  Y1  Y1  Y0  500 501 502 503 504 505 506 507

5   Y5  Y6  Y0  Y5  Y0  Y0  600 601 602 603 604 605 606 607

and a mapping

mapping = {

    'Y0': 'X0',

    'Y1': 'X1',

    'Y2': 'X2',

    'Y3': 'X3',

    'Y4': 'X4',

    'Y5': 'X5',

    'Y6': 'X6',

    'Y7': 'X7',

}

In reality, the mapping between the Xs and the Ys is not trivial (but it can be put into code).

I am trying to assign new columns, A_result through F_result, whose values come from columns X0 through X7 using the values in A through F as a key.

For example, the second element of C_result should be 201 because the second element of C is Y1 and the second element of X1 has a value of 201.

The code I have to do this now is

for col in ['A', 'B', 'C', 'D', 'E', 'F']:

    col_result = col + '_result'

    df[col_result] = 0



    for k, v in mapping.items():

        df.loc[df[col] == k, col_result] = df[v]



df.filter(regex='_result', axis=1)



    A_result    B_result    C_result    D_result    E_result    F_result

0   100         102         107         102         106         100

1   202         205         201         205         203         205

2   305         300         304         307         301         302

3   401         402         402         407         406         400

4   503         507         505         501         501         500

5   605         606         600         605         600         600

This works, but it's pretty slow. On a DataFrame with a few hundred rows, it takes a little less than a second to run. What's a faster way to do this?

python pandas dataframe

asked Nov 20 '18 at 17:54

Kris Harper

2,98652864

asked Nov 20 '18 at 17:54

Kris Harper

2,98652864

asked Nov 20 '18 at 17:54

Kris Harper

2,98652864

asked Nov 20 '18 at 17:54

Kris Harper

2,98652864

asked Nov 20 '18 at 17:54

Kris Harper

2,98652864

add a comment |

1 Answer
1

active

oldest

votes

Optimized a bit now, you can try this:

df2 = df.iloc[:,:6].apply(lambda x: [df[mapping.get(elem)][i] for i, elem in enumerate(x)]).rename(columns=lambda c: f'{c}_results')



#    A_results  B_results  C_results  D_results  E_results  F_results

# 0        100        102        107        102        106        100

# 1        202        205        201        205        203        205

# 2        305        300        304        307        301        302

# 3        401        402        402        407        406        400

# 4        503        507        505        501        501        500

# 5        605        606        600        605        600        600

Test speed on n=1000:

this method = 6.141038116

your method = 96.079351477

Worth noting that on single runs the performance doesn't seem to vary that much though.

edited Nov 20 '18 at 19:09

answered Nov 20 '18 at 18:21

Idlehands

4,4551518

Apologies, I mistimed, this in indeed much faster. +1 !

– jpp
Nov 20 '18 at 19:01

@jpp I just edited the answer. The original answer did seem a bit slow, seems the 2 separate operation (apply/column) took a toll on the run time. Doing it all in one go seems to be an improvement. Weird that when I run it individually the time are comparable (~2s) but when running in bulk (n=1000) the difference is quite noticeable.

– Idlehands
Nov 20 '18 at 19:02

Awesome, this cut the execution time in half.

– Kris Harper
Nov 22 '18 at 19:33

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53398806%2fhow-can-i-assign-values-to-a-dataframe-based-on-column-to-value-mapping%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Optimized a bit now, you can try this:

df2 = df.iloc[:,:6].apply(lambda x: [df[mapping.get(elem)][i] for i, elem in enumerate(x)]).rename(columns=lambda c: f'{c}_results')



#    A_results  B_results  C_results  D_results  E_results  F_results

# 0        100        102        107        102        106        100

# 1        202        205        201        205        203        205

# 2        305        300        304        307        301        302

# 3        401        402        402        407        406        400

# 4        503        507        505        501        501        500

# 5        605        606        600        605        600        600

Test speed on n=1000:

this method = 6.141038116

your method = 96.079351477

Worth noting that on single runs the performance doesn't seem to vary that much though.

edited Nov 20 '18 at 19:09

answered Nov 20 '18 at 18:21

Idlehands

4,4551518

Apologies, I mistimed, this in indeed much faster. +1 !

– jpp
Nov 20 '18 at 19:01

@jpp I just edited the answer. The original answer did seem a bit slow, seems the 2 separate operation (apply/column) took a toll on the run time. Doing it all in one go seems to be an improvement. Weird that when I run it individually the time are comparable (~2s) but when running in bulk (n=1000) the difference is quite noticeable.

– Idlehands
Nov 20 '18 at 19:02

Awesome, this cut the execution time in half.

– Kris Harper
Nov 22 '18 at 19:33

add a comment |

Optimized a bit now, you can try this:

df2 = df.iloc[:,:6].apply(lambda x: [df[mapping.get(elem)][i] for i, elem in enumerate(x)]).rename(columns=lambda c: f'{c}_results')



#    A_results  B_results  C_results  D_results  E_results  F_results

# 0        100        102        107        102        106        100

# 1        202        205        201        205        203        205

# 2        305        300        304        307        301        302

# 3        401        402        402        407        406        400

# 4        503        507        505        501        501        500

# 5        605        606        600        605        600        600

Test speed on n=1000:

this method = 6.141038116

your method = 96.079351477

Worth noting that on single runs the performance doesn't seem to vary that much though.

edited Nov 20 '18 at 19:09

answered Nov 20 '18 at 18:21

Idlehands

4,4551518

Apologies, I mistimed, this in indeed much faster. +1 !

– jpp
Nov 20 '18 at 19:01

@jpp I just edited the answer. The original answer did seem a bit slow, seems the 2 separate operation (apply/column) took a toll on the run time. Doing it all in one go seems to be an improvement. Weird that when I run it individually the time are comparable (~2s) but when running in bulk (n=1000) the difference is quite noticeable.

– Idlehands
Nov 20 '18 at 19:02

Awesome, this cut the execution time in half.

– Kris Harper
Nov 22 '18 at 19:33

add a comment |

Optimized a bit now, you can try this:

df2 = df.iloc[:,:6].apply(lambda x: [df[mapping.get(elem)][i] for i, elem in enumerate(x)]).rename(columns=lambda c: f'{c}_results')



#    A_results  B_results  C_results  D_results  E_results  F_results

# 0        100        102        107        102        106        100

# 1        202        205        201        205        203        205

# 2        305        300        304        307        301        302

# 3        401        402        402        407        406        400

# 4        503        507        505        501        501        500

# 5        605        606        600        605        600        600

Test speed on n=1000:

this method = 6.141038116

your method = 96.079351477

Worth noting that on single runs the performance doesn't seem to vary that much though.

edited Nov 20 '18 at 19:09

answered Nov 20 '18 at 18:21

Idlehands

4,4551518

Optimized a bit now, you can try this:

df2 = df.iloc[:,:6].apply(lambda x: [df[mapping.get(elem)][i] for i, elem in enumerate(x)]).rename(columns=lambda c: f'{c}_results')



#    A_results  B_results  C_results  D_results  E_results  F_results

# 0        100        102        107        102        106        100

# 1        202        205        201        205        203        205

# 2        305        300        304        307        301        302

# 3        401        402        402        407        406        400

# 4        503        507        505        501        501        500

# 5        605        606        600        605        600        600

Test speed on n=1000:

this method = 6.141038116

your method = 96.079351477

Worth noting that on single runs the performance doesn't seem to vary that much though.

edited Nov 20 '18 at 19:09

answered Nov 20 '18 at 18:21

Idlehands

4,4551518

edited Nov 20 '18 at 19:09

answered Nov 20 '18 at 18:21

Idlehands

4,4551518

answered Nov 20 '18 at 18:21

Idlehands

4,4551518

answered Nov 20 '18 at 18:21

Idlehands

4,4551518

Apologies, I mistimed, this in indeed much faster. +1 !

– jpp
Nov 20 '18 at 19:01

@jpp I just edited the answer. The original answer did seem a bit slow, seems the 2 separate operation (apply/column) took a toll on the run time. Doing it all in one go seems to be an improvement. Weird that when I run it individually the time are comparable (~2s) but when running in bulk (n=1000) the difference is quite noticeable.

– Idlehands
Nov 20 '18 at 19:02

Awesome, this cut the execution time in half.

– Kris Harper
Nov 22 '18 at 19:33

add a comment |

Apologies, I mistimed, this in indeed much faster. +1 !

– jpp
Nov 20 '18 at 19:01

@jpp I just edited the answer. The original answer did seem a bit slow, seems the 2 separate operation (apply/column) took a toll on the run time. Doing it all in one go seems to be an improvement. Weird that when I run it individually the time are comparable (~2s) but when running in bulk (n=1000) the difference is quite noticeable.

– Idlehands
Nov 20 '18 at 19:02

Awesome, this cut the execution time in half.

– Kris Harper
Nov 22 '18 at 19:33

Apologies, I mistimed, this in indeed much faster. +1 !

– jpp
Nov 20 '18 at 19:01

@jpp I just edited the answer. The original answer did seem a bit slow, seems the 2 separate operation (apply/column) took a toll on the run time. Doing it all in one go seems to be an improvement. Weird that when I run it individually the time are comparable (~2s) but when running in bulk (n=1000) the difference is quite noticeable.

– Idlehands
Nov 20 '18 at 19:02

Awesome, this cut the execution time in half.

– Kris Harper
Nov 22 '18 at 19:33

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

Search This Blog

Ufyukyu