How can I assign values to a DataFrame based on column-to-value mapping?












3















I have a DataFrame that looks like this



df = pd.DataFrame({
"A": ['Y0', 'Y2', 'Y5', 'Y1', 'Y3', 'Y5'],
"B": ['Y2', 'Y5', 'Y0', 'Y2', 'Y7', 'Y6'],
"C": ['Y7', 'Y1', 'Y4', 'Y2', 'Y5', 'Y0'],
"D": ['Y2', 'Y5', 'Y7', 'Y7', 'Y1', 'Y5'],
"E": ['Y6', 'Y3', 'Y1', 'Y6', 'Y1', 'Y0'],
"F": ['Y0', 'Y5', 'Y2', 'Y0', 'Y0', 'Y0'],
"X0": [100, 200, 300, 400, 500, 600],
"X1": [101, 201, 301, 401, 501, 601],
"X2": [102, 202, 302, 402, 502, 602],
"X3": [103, 203, 303, 403, 503, 603],
"X4": [104, 204, 304, 404, 504, 604],
"X5": [105, 205, 305, 405, 505, 605],
"X6": [106, 206, 306, 406, 506, 606],
"X7": [107, 207, 307, 407, 507, 607]
})

df

A B C D E F X0 X1 X2 X3 X4 X5 X6 X7
0 Y0 Y2 Y7 Y2 Y6 Y0 100 101 102 103 104 105 106 107
1 Y2 Y5 Y1 Y5 Y3 Y5 200 201 202 203 204 205 206 207
2 Y5 Y0 Y4 Y7 Y1 Y2 300 301 302 303 304 305 306 307
3 Y1 Y2 Y2 Y7 Y6 Y0 400 401 402 403 404 405 406 407
4 Y3 Y7 Y5 Y1 Y1 Y0 500 501 502 503 504 505 506 507
5 Y5 Y6 Y0 Y5 Y0 Y0 600 601 602 603 604 605 606 607


and a mapping



mapping = {
'Y0': 'X0',
'Y1': 'X1',
'Y2': 'X2',
'Y3': 'X3',
'Y4': 'X4',
'Y5': 'X5',
'Y6': 'X6',
'Y7': 'X7',
}


In reality, the mapping between the Xs and the Ys is not trivial (but it can be put into code).



I am trying to assign new columns, A_result through F_result, whose values come from columns X0 through X7 using the values in A through F as a key.



For example, the second element of C_result should be 201 because the second element of C is Y1 and the second element of X1 has a value of 201.



The code I have to do this now is



for col in ['A', 'B', 'C', 'D', 'E', 'F']:
col_result = col + '_result'
df[col_result] = 0

for k, v in mapping.items():
df.loc[df[col] == k, col_result] = df[v]

df.filter(regex='_result', axis=1)

A_result B_result C_result D_result E_result F_result
0 100 102 107 102 106 100
1 202 205 201 205 203 205
2 305 300 304 307 301 302
3 401 402 402 407 406 400
4 503 507 505 501 501 500
5 605 606 600 605 600 600


This works, but it's pretty slow. On a DataFrame with a few hundred rows, it takes a little less than a second to run. What's a faster way to do this?










share|improve this question



























    3















    I have a DataFrame that looks like this



    df = pd.DataFrame({
    "A": ['Y0', 'Y2', 'Y5', 'Y1', 'Y3', 'Y5'],
    "B": ['Y2', 'Y5', 'Y0', 'Y2', 'Y7', 'Y6'],
    "C": ['Y7', 'Y1', 'Y4', 'Y2', 'Y5', 'Y0'],
    "D": ['Y2', 'Y5', 'Y7', 'Y7', 'Y1', 'Y5'],
    "E": ['Y6', 'Y3', 'Y1', 'Y6', 'Y1', 'Y0'],
    "F": ['Y0', 'Y5', 'Y2', 'Y0', 'Y0', 'Y0'],
    "X0": [100, 200, 300, 400, 500, 600],
    "X1": [101, 201, 301, 401, 501, 601],
    "X2": [102, 202, 302, 402, 502, 602],
    "X3": [103, 203, 303, 403, 503, 603],
    "X4": [104, 204, 304, 404, 504, 604],
    "X5": [105, 205, 305, 405, 505, 605],
    "X6": [106, 206, 306, 406, 506, 606],
    "X7": [107, 207, 307, 407, 507, 607]
    })

    df

    A B C D E F X0 X1 X2 X3 X4 X5 X6 X7
    0 Y0 Y2 Y7 Y2 Y6 Y0 100 101 102 103 104 105 106 107
    1 Y2 Y5 Y1 Y5 Y3 Y5 200 201 202 203 204 205 206 207
    2 Y5 Y0 Y4 Y7 Y1 Y2 300 301 302 303 304 305 306 307
    3 Y1 Y2 Y2 Y7 Y6 Y0 400 401 402 403 404 405 406 407
    4 Y3 Y7 Y5 Y1 Y1 Y0 500 501 502 503 504 505 506 507
    5 Y5 Y6 Y0 Y5 Y0 Y0 600 601 602 603 604 605 606 607


    and a mapping



    mapping = {
    'Y0': 'X0',
    'Y1': 'X1',
    'Y2': 'X2',
    'Y3': 'X3',
    'Y4': 'X4',
    'Y5': 'X5',
    'Y6': 'X6',
    'Y7': 'X7',
    }


    In reality, the mapping between the Xs and the Ys is not trivial (but it can be put into code).



    I am trying to assign new columns, A_result through F_result, whose values come from columns X0 through X7 using the values in A through F as a key.



    For example, the second element of C_result should be 201 because the second element of C is Y1 and the second element of X1 has a value of 201.



    The code I have to do this now is



    for col in ['A', 'B', 'C', 'D', 'E', 'F']:
    col_result = col + '_result'
    df[col_result] = 0

    for k, v in mapping.items():
    df.loc[df[col] == k, col_result] = df[v]

    df.filter(regex='_result', axis=1)

    A_result B_result C_result D_result E_result F_result
    0 100 102 107 102 106 100
    1 202 205 201 205 203 205
    2 305 300 304 307 301 302
    3 401 402 402 407 406 400
    4 503 507 505 501 501 500
    5 605 606 600 605 600 600


    This works, but it's pretty slow. On a DataFrame with a few hundred rows, it takes a little less than a second to run. What's a faster way to do this?










    share|improve this question

























      3












      3








      3








      I have a DataFrame that looks like this



      df = pd.DataFrame({
      "A": ['Y0', 'Y2', 'Y5', 'Y1', 'Y3', 'Y5'],
      "B": ['Y2', 'Y5', 'Y0', 'Y2', 'Y7', 'Y6'],
      "C": ['Y7', 'Y1', 'Y4', 'Y2', 'Y5', 'Y0'],
      "D": ['Y2', 'Y5', 'Y7', 'Y7', 'Y1', 'Y5'],
      "E": ['Y6', 'Y3', 'Y1', 'Y6', 'Y1', 'Y0'],
      "F": ['Y0', 'Y5', 'Y2', 'Y0', 'Y0', 'Y0'],
      "X0": [100, 200, 300, 400, 500, 600],
      "X1": [101, 201, 301, 401, 501, 601],
      "X2": [102, 202, 302, 402, 502, 602],
      "X3": [103, 203, 303, 403, 503, 603],
      "X4": [104, 204, 304, 404, 504, 604],
      "X5": [105, 205, 305, 405, 505, 605],
      "X6": [106, 206, 306, 406, 506, 606],
      "X7": [107, 207, 307, 407, 507, 607]
      })

      df

      A B C D E F X0 X1 X2 X3 X4 X5 X6 X7
      0 Y0 Y2 Y7 Y2 Y6 Y0 100 101 102 103 104 105 106 107
      1 Y2 Y5 Y1 Y5 Y3 Y5 200 201 202 203 204 205 206 207
      2 Y5 Y0 Y4 Y7 Y1 Y2 300 301 302 303 304 305 306 307
      3 Y1 Y2 Y2 Y7 Y6 Y0 400 401 402 403 404 405 406 407
      4 Y3 Y7 Y5 Y1 Y1 Y0 500 501 502 503 504 505 506 507
      5 Y5 Y6 Y0 Y5 Y0 Y0 600 601 602 603 604 605 606 607


      and a mapping



      mapping = {
      'Y0': 'X0',
      'Y1': 'X1',
      'Y2': 'X2',
      'Y3': 'X3',
      'Y4': 'X4',
      'Y5': 'X5',
      'Y6': 'X6',
      'Y7': 'X7',
      }


      In reality, the mapping between the Xs and the Ys is not trivial (but it can be put into code).



      I am trying to assign new columns, A_result through F_result, whose values come from columns X0 through X7 using the values in A through F as a key.



      For example, the second element of C_result should be 201 because the second element of C is Y1 and the second element of X1 has a value of 201.



      The code I have to do this now is



      for col in ['A', 'B', 'C', 'D', 'E', 'F']:
      col_result = col + '_result'
      df[col_result] = 0

      for k, v in mapping.items():
      df.loc[df[col] == k, col_result] = df[v]

      df.filter(regex='_result', axis=1)

      A_result B_result C_result D_result E_result F_result
      0 100 102 107 102 106 100
      1 202 205 201 205 203 205
      2 305 300 304 307 301 302
      3 401 402 402 407 406 400
      4 503 507 505 501 501 500
      5 605 606 600 605 600 600


      This works, but it's pretty slow. On a DataFrame with a few hundred rows, it takes a little less than a second to run. What's a faster way to do this?










      share|improve this question














      I have a DataFrame that looks like this



      df = pd.DataFrame({
      "A": ['Y0', 'Y2', 'Y5', 'Y1', 'Y3', 'Y5'],
      "B": ['Y2', 'Y5', 'Y0', 'Y2', 'Y7', 'Y6'],
      "C": ['Y7', 'Y1', 'Y4', 'Y2', 'Y5', 'Y0'],
      "D": ['Y2', 'Y5', 'Y7', 'Y7', 'Y1', 'Y5'],
      "E": ['Y6', 'Y3', 'Y1', 'Y6', 'Y1', 'Y0'],
      "F": ['Y0', 'Y5', 'Y2', 'Y0', 'Y0', 'Y0'],
      "X0": [100, 200, 300, 400, 500, 600],
      "X1": [101, 201, 301, 401, 501, 601],
      "X2": [102, 202, 302, 402, 502, 602],
      "X3": [103, 203, 303, 403, 503, 603],
      "X4": [104, 204, 304, 404, 504, 604],
      "X5": [105, 205, 305, 405, 505, 605],
      "X6": [106, 206, 306, 406, 506, 606],
      "X7": [107, 207, 307, 407, 507, 607]
      })

      df

      A B C D E F X0 X1 X2 X3 X4 X5 X6 X7
      0 Y0 Y2 Y7 Y2 Y6 Y0 100 101 102 103 104 105 106 107
      1 Y2 Y5 Y1 Y5 Y3 Y5 200 201 202 203 204 205 206 207
      2 Y5 Y0 Y4 Y7 Y1 Y2 300 301 302 303 304 305 306 307
      3 Y1 Y2 Y2 Y7 Y6 Y0 400 401 402 403 404 405 406 407
      4 Y3 Y7 Y5 Y1 Y1 Y0 500 501 502 503 504 505 506 507
      5 Y5 Y6 Y0 Y5 Y0 Y0 600 601 602 603 604 605 606 607


      and a mapping



      mapping = {
      'Y0': 'X0',
      'Y1': 'X1',
      'Y2': 'X2',
      'Y3': 'X3',
      'Y4': 'X4',
      'Y5': 'X5',
      'Y6': 'X6',
      'Y7': 'X7',
      }


      In reality, the mapping between the Xs and the Ys is not trivial (but it can be put into code).



      I am trying to assign new columns, A_result through F_result, whose values come from columns X0 through X7 using the values in A through F as a key.



      For example, the second element of C_result should be 201 because the second element of C is Y1 and the second element of X1 has a value of 201.



      The code I have to do this now is



      for col in ['A', 'B', 'C', 'D', 'E', 'F']:
      col_result = col + '_result'
      df[col_result] = 0

      for k, v in mapping.items():
      df.loc[df[col] == k, col_result] = df[v]

      df.filter(regex='_result', axis=1)

      A_result B_result C_result D_result E_result F_result
      0 100 102 107 102 106 100
      1 202 205 201 205 203 205
      2 305 300 304 307 301 302
      3 401 402 402 407 406 400
      4 503 507 505 501 501 500
      5 605 606 600 605 600 600


      This works, but it's pretty slow. On a DataFrame with a few hundred rows, it takes a little less than a second to run. What's a faster way to do this?







      python pandas dataframe






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 20 '18 at 17:54









      Kris HarperKris Harper

      2,98652864




      2,98652864
























          1 Answer
          1






          active

          oldest

          votes


















          2














          Optimized a bit now, you can try this:



          df2 = df.iloc[:,:6].apply(lambda x: [df[mapping.get(elem)][i] for i, elem in enumerate(x)]).rename(columns=lambda c: f'{c}_results')

          # A_results B_results C_results D_results E_results F_results
          # 0 100 102 107 102 106 100
          # 1 202 205 201 205 203 205
          # 2 305 300 304 307 301 302
          # 3 401 402 402 407 406 400
          # 4 503 507 505 501 501 500
          # 5 605 606 600 605 600 600


          Test speed on n=1000:



          this method = 6.141038116
          your method = 96.079351477


          Worth noting that on single runs the performance doesn't seem to vary that much though.






          share|improve this answer


























          • Apologies, I mistimed, this in indeed much faster. +1 !

            – jpp
            Nov 20 '18 at 19:01













          • @jpp I just edited the answer. The original answer did seem a bit slow, seems the 2 separate operation (apply/column) took a toll on the run time. Doing it all in one go seems to be an improvement. Weird that when I run it individually the time are comparable (~2s) but when running in bulk (n=1000) the difference is quite noticeable.

            – Idlehands
            Nov 20 '18 at 19:02











          • Awesome, this cut the execution time in half.

            – Kris Harper
            Nov 22 '18 at 19:33











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53398806%2fhow-can-i-assign-values-to-a-dataframe-based-on-column-to-value-mapping%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          2














          Optimized a bit now, you can try this:



          df2 = df.iloc[:,:6].apply(lambda x: [df[mapping.get(elem)][i] for i, elem in enumerate(x)]).rename(columns=lambda c: f'{c}_results')

          # A_results B_results C_results D_results E_results F_results
          # 0 100 102 107 102 106 100
          # 1 202 205 201 205 203 205
          # 2 305 300 304 307 301 302
          # 3 401 402 402 407 406 400
          # 4 503 507 505 501 501 500
          # 5 605 606 600 605 600 600


          Test speed on n=1000:



          this method = 6.141038116
          your method = 96.079351477


          Worth noting that on single runs the performance doesn't seem to vary that much though.






          share|improve this answer


























          • Apologies, I mistimed, this in indeed much faster. +1 !

            – jpp
            Nov 20 '18 at 19:01













          • @jpp I just edited the answer. The original answer did seem a bit slow, seems the 2 separate operation (apply/column) took a toll on the run time. Doing it all in one go seems to be an improvement. Weird that when I run it individually the time are comparable (~2s) but when running in bulk (n=1000) the difference is quite noticeable.

            – Idlehands
            Nov 20 '18 at 19:02











          • Awesome, this cut the execution time in half.

            – Kris Harper
            Nov 22 '18 at 19:33
















          2














          Optimized a bit now, you can try this:



          df2 = df.iloc[:,:6].apply(lambda x: [df[mapping.get(elem)][i] for i, elem in enumerate(x)]).rename(columns=lambda c: f'{c}_results')

          # A_results B_results C_results D_results E_results F_results
          # 0 100 102 107 102 106 100
          # 1 202 205 201 205 203 205
          # 2 305 300 304 307 301 302
          # 3 401 402 402 407 406 400
          # 4 503 507 505 501 501 500
          # 5 605 606 600 605 600 600


          Test speed on n=1000:



          this method = 6.141038116
          your method = 96.079351477


          Worth noting that on single runs the performance doesn't seem to vary that much though.






          share|improve this answer


























          • Apologies, I mistimed, this in indeed much faster. +1 !

            – jpp
            Nov 20 '18 at 19:01













          • @jpp I just edited the answer. The original answer did seem a bit slow, seems the 2 separate operation (apply/column) took a toll on the run time. Doing it all in one go seems to be an improvement. Weird that when I run it individually the time are comparable (~2s) but when running in bulk (n=1000) the difference is quite noticeable.

            – Idlehands
            Nov 20 '18 at 19:02











          • Awesome, this cut the execution time in half.

            – Kris Harper
            Nov 22 '18 at 19:33














          2












          2








          2







          Optimized a bit now, you can try this:



          df2 = df.iloc[:,:6].apply(lambda x: [df[mapping.get(elem)][i] for i, elem in enumerate(x)]).rename(columns=lambda c: f'{c}_results')

          # A_results B_results C_results D_results E_results F_results
          # 0 100 102 107 102 106 100
          # 1 202 205 201 205 203 205
          # 2 305 300 304 307 301 302
          # 3 401 402 402 407 406 400
          # 4 503 507 505 501 501 500
          # 5 605 606 600 605 600 600


          Test speed on n=1000:



          this method = 6.141038116
          your method = 96.079351477


          Worth noting that on single runs the performance doesn't seem to vary that much though.






          share|improve this answer















          Optimized a bit now, you can try this:



          df2 = df.iloc[:,:6].apply(lambda x: [df[mapping.get(elem)][i] for i, elem in enumerate(x)]).rename(columns=lambda c: f'{c}_results')

          # A_results B_results C_results D_results E_results F_results
          # 0 100 102 107 102 106 100
          # 1 202 205 201 205 203 205
          # 2 305 300 304 307 301 302
          # 3 401 402 402 407 406 400
          # 4 503 507 505 501 501 500
          # 5 605 606 600 605 600 600


          Test speed on n=1000:



          this method = 6.141038116
          your method = 96.079351477


          Worth noting that on single runs the performance doesn't seem to vary that much though.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 20 '18 at 19:09

























          answered Nov 20 '18 at 18:21









          IdlehandsIdlehands

          4,4551518




          4,4551518













          • Apologies, I mistimed, this in indeed much faster. +1 !

            – jpp
            Nov 20 '18 at 19:01













          • @jpp I just edited the answer. The original answer did seem a bit slow, seems the 2 separate operation (apply/column) took a toll on the run time. Doing it all in one go seems to be an improvement. Weird that when I run it individually the time are comparable (~2s) but when running in bulk (n=1000) the difference is quite noticeable.

            – Idlehands
            Nov 20 '18 at 19:02











          • Awesome, this cut the execution time in half.

            – Kris Harper
            Nov 22 '18 at 19:33



















          • Apologies, I mistimed, this in indeed much faster. +1 !

            – jpp
            Nov 20 '18 at 19:01













          • @jpp I just edited the answer. The original answer did seem a bit slow, seems the 2 separate operation (apply/column) took a toll on the run time. Doing it all in one go seems to be an improvement. Weird that when I run it individually the time are comparable (~2s) but when running in bulk (n=1000) the difference is quite noticeable.

            – Idlehands
            Nov 20 '18 at 19:02











          • Awesome, this cut the execution time in half.

            – Kris Harper
            Nov 22 '18 at 19:33

















          Apologies, I mistimed, this in indeed much faster. +1 !

          – jpp
          Nov 20 '18 at 19:01







          Apologies, I mistimed, this in indeed much faster. +1 !

          – jpp
          Nov 20 '18 at 19:01















          @jpp I just edited the answer. The original answer did seem a bit slow, seems the 2 separate operation (apply/column) took a toll on the run time. Doing it all in one go seems to be an improvement. Weird that when I run it individually the time are comparable (~2s) but when running in bulk (n=1000) the difference is quite noticeable.

          – Idlehands
          Nov 20 '18 at 19:02





          @jpp I just edited the answer. The original answer did seem a bit slow, seems the 2 separate operation (apply/column) took a toll on the run time. Doing it all in one go seems to be an improvement. Weird that when I run it individually the time are comparable (~2s) but when running in bulk (n=1000) the difference is quite noticeable.

          – Idlehands
          Nov 20 '18 at 19:02













          Awesome, this cut the execution time in half.

          – Kris Harper
          Nov 22 '18 at 19:33





          Awesome, this cut the execution time in half.

          – Kris Harper
          Nov 22 '18 at 19:33


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53398806%2fhow-can-i-assign-values-to-a-dataframe-based-on-column-to-value-mapping%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          MongoDB - Not Authorized To Execute Command

          in spring boot 2.1 many test slices are not allowed anymore due to multiple @BootstrapWith

          How to fix TextFormField cause rebuild widget in Flutter