Populate Pandas DataFrame using a dictionary based on a condition












0















I have a DataFrame



>> test = pd.DataFrame({'A': ['a', 'b', 'b', 'b'], 'B': [1, 2, 3, 4], 'C': [np.nan, np.nan, np.nan, np.nan], 'D': [np.nan, np.nan, np.nan, np.nan]})
A B C D
0 a 1
1 b 2
2 b 3
3 b 4


I also have a dictionary, where b in input_b signifies that I'm only modifying rows where row.A = b.



>> input_b = {2: ['Moon', 'Elephant'], 4: ['Sun', 'Mouse']}


How do I populate the DataFrame with values from the dictionary to get



    A   B   C       D
0 a 1
1 b 2 Moon Elephant
2 b 3
3 b 4 Sun Mouse









share|improve this question

























  • I understand how to do this by first turning the dictionary into a dataframe and then merge'ing it with the first. However, this gets very slow if the first dataframe (test) is much larger than its portion that I wish to modify.

    – GingerBadger
    Jan 2 at 23:55






  • 1





    I think transforming your dictionary into a dataframe using pd.from_dict(your_dictionary) and then merging with input_b seems to me like the best solution. I would be surprised if there is a better solution.

    – KenHBS
    Jan 3 at 0:30


















0















I have a DataFrame



>> test = pd.DataFrame({'A': ['a', 'b', 'b', 'b'], 'B': [1, 2, 3, 4], 'C': [np.nan, np.nan, np.nan, np.nan], 'D': [np.nan, np.nan, np.nan, np.nan]})
A B C D
0 a 1
1 b 2
2 b 3
3 b 4


I also have a dictionary, where b in input_b signifies that I'm only modifying rows where row.A = b.



>> input_b = {2: ['Moon', 'Elephant'], 4: ['Sun', 'Mouse']}


How do I populate the DataFrame with values from the dictionary to get



    A   B   C       D
0 a 1
1 b 2 Moon Elephant
2 b 3
3 b 4 Sun Mouse









share|improve this question

























  • I understand how to do this by first turning the dictionary into a dataframe and then merge'ing it with the first. However, this gets very slow if the first dataframe (test) is much larger than its portion that I wish to modify.

    – GingerBadger
    Jan 2 at 23:55






  • 1





    I think transforming your dictionary into a dataframe using pd.from_dict(your_dictionary) and then merging with input_b seems to me like the best solution. I would be surprised if there is a better solution.

    – KenHBS
    Jan 3 at 0:30
















0












0








0








I have a DataFrame



>> test = pd.DataFrame({'A': ['a', 'b', 'b', 'b'], 'B': [1, 2, 3, 4], 'C': [np.nan, np.nan, np.nan, np.nan], 'D': [np.nan, np.nan, np.nan, np.nan]})
A B C D
0 a 1
1 b 2
2 b 3
3 b 4


I also have a dictionary, where b in input_b signifies that I'm only modifying rows where row.A = b.



>> input_b = {2: ['Moon', 'Elephant'], 4: ['Sun', 'Mouse']}


How do I populate the DataFrame with values from the dictionary to get



    A   B   C       D
0 a 1
1 b 2 Moon Elephant
2 b 3
3 b 4 Sun Mouse









share|improve this question
















I have a DataFrame



>> test = pd.DataFrame({'A': ['a', 'b', 'b', 'b'], 'B': [1, 2, 3, 4], 'C': [np.nan, np.nan, np.nan, np.nan], 'D': [np.nan, np.nan, np.nan, np.nan]})
A B C D
0 a 1
1 b 2
2 b 3
3 b 4


I also have a dictionary, where b in input_b signifies that I'm only modifying rows where row.A = b.



>> input_b = {2: ['Moon', 'Elephant'], 4: ['Sun', 'Mouse']}


How do I populate the DataFrame with values from the dictionary to get



    A   B   C       D
0 a 1
1 b 2 Moon Elephant
2 b 3
3 b 4 Sun Mouse






python pandas dataframe assign






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 3 at 0:46







GingerBadger

















asked Jan 2 at 23:52









GingerBadgerGingerBadger

1166




1166













  • I understand how to do this by first turning the dictionary into a dataframe and then merge'ing it with the first. However, this gets very slow if the first dataframe (test) is much larger than its portion that I wish to modify.

    – GingerBadger
    Jan 2 at 23:55






  • 1





    I think transforming your dictionary into a dataframe using pd.from_dict(your_dictionary) and then merging with input_b seems to me like the best solution. I would be surprised if there is a better solution.

    – KenHBS
    Jan 3 at 0:30





















  • I understand how to do this by first turning the dictionary into a dataframe and then merge'ing it with the first. However, this gets very slow if the first dataframe (test) is much larger than its portion that I wish to modify.

    – GingerBadger
    Jan 2 at 23:55






  • 1





    I think transforming your dictionary into a dataframe using pd.from_dict(your_dictionary) and then merging with input_b seems to me like the best solution. I would be surprised if there is a better solution.

    – KenHBS
    Jan 3 at 0:30



















I understand how to do this by first turning the dictionary into a dataframe and then merge'ing it with the first. However, this gets very slow if the first dataframe (test) is much larger than its portion that I wish to modify.

– GingerBadger
Jan 2 at 23:55





I understand how to do this by first turning the dictionary into a dataframe and then merge'ing it with the first. However, this gets very slow if the first dataframe (test) is much larger than its portion that I wish to modify.

– GingerBadger
Jan 2 at 23:55




1




1





I think transforming your dictionary into a dataframe using pd.from_dict(your_dictionary) and then merging with input_b seems to me like the best solution. I would be surprised if there is a better solution.

– KenHBS
Jan 3 at 0:30







I think transforming your dictionary into a dataframe using pd.from_dict(your_dictionary) and then merging with input_b seems to me like the best solution. I would be surprised if there is a better solution.

– KenHBS
Jan 3 at 0:30














4 Answers
4






active

oldest

votes


















1














This may not be the most efficient solution, but from what I understand it got the job done:



import pandas as pd
import numpy as np

test = pd.DataFrame({'A': ['a', 'b', 'b', 'b'], 'B': [1, 2, 3, 4],
'C': [np.nan, np.nan, np.nan, np.nan],
'D': [np.nan, np.nan, np.nan, np.nan]})


input_b = {2: ['Moon', 'Elephant'], 4: ['Sun', 'Mouse']}


for key, value in input_b.items():
test.loc[test['B'] == key, ['C', 'D']] = value

print(test)


Yields:



   A  B     C         D
0 a 1 NaN NaN
1 b 2 Moon Elephant
2 b 3 NaN NaN
3 b 4 Sun Mouse


This will get slower if the dictionary input_b gets too large (too many rows are being updated, too many iterations in the for loop), but should be relatively fast with small input_b's even with large test dataframes.



This answer also assumes the keys in the input_b dictionary refer to the values of the B column in the original dataframe, and will add repeated values in the C and D columns for repeated values in the B column.






share|improve this answer

































    1














    Using update



    test=test.set_index('B')
    test.update(pd.DataFrame(input_b,index=['C','D']).T)
    test=test.reset_index()
    test
    B A C D
    0 1 a NaN NaN
    1 2 b Moon Elephant
    2 3 b NaN NaN
    3 4 b Sun Mouse





    share|improve this answer































      1














      You can use loc indexing after setting your index to B:



      test = test.set_index('B')
      test.loc[input_b, ['C', 'D']] = list(input_b.values())
      test = test.reset_index()

      print(test)

      B A C D
      0 1 a NaN NaN
      1 2 b Moon Elephant
      2 3 b NaN NaN
      3 4 b Sun Mouse





      share|improve this answer































        1














        Using apply



        test['C'] = test['B'].map(input_b).apply(lambda x: x[0] if type(x)==list else x)
        test['D'] = test['B'].map(input_b).apply(lambda x: x[1] if type(x)==list else x)


        yields



           A  B     C         D
        0 a 1 NaN NaN
        1 b 2 Moon Elephant
        2 b 3 NaN NaN
        3 b 4 Sun Mouse





        share|improve this answer
























          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54014687%2fpopulate-pandas-dataframe-using-a-dictionary-based-on-a-condition%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          4 Answers
          4






          active

          oldest

          votes








          4 Answers
          4






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          1














          This may not be the most efficient solution, but from what I understand it got the job done:



          import pandas as pd
          import numpy as np

          test = pd.DataFrame({'A': ['a', 'b', 'b', 'b'], 'B': [1, 2, 3, 4],
          'C': [np.nan, np.nan, np.nan, np.nan],
          'D': [np.nan, np.nan, np.nan, np.nan]})


          input_b = {2: ['Moon', 'Elephant'], 4: ['Sun', 'Mouse']}


          for key, value in input_b.items():
          test.loc[test['B'] == key, ['C', 'D']] = value

          print(test)


          Yields:



             A  B     C         D
          0 a 1 NaN NaN
          1 b 2 Moon Elephant
          2 b 3 NaN NaN
          3 b 4 Sun Mouse


          This will get slower if the dictionary input_b gets too large (too many rows are being updated, too many iterations in the for loop), but should be relatively fast with small input_b's even with large test dataframes.



          This answer also assumes the keys in the input_b dictionary refer to the values of the B column in the original dataframe, and will add repeated values in the C and D columns for repeated values in the B column.






          share|improve this answer






























            1














            This may not be the most efficient solution, but from what I understand it got the job done:



            import pandas as pd
            import numpy as np

            test = pd.DataFrame({'A': ['a', 'b', 'b', 'b'], 'B': [1, 2, 3, 4],
            'C': [np.nan, np.nan, np.nan, np.nan],
            'D': [np.nan, np.nan, np.nan, np.nan]})


            input_b = {2: ['Moon', 'Elephant'], 4: ['Sun', 'Mouse']}


            for key, value in input_b.items():
            test.loc[test['B'] == key, ['C', 'D']] = value

            print(test)


            Yields:



               A  B     C         D
            0 a 1 NaN NaN
            1 b 2 Moon Elephant
            2 b 3 NaN NaN
            3 b 4 Sun Mouse


            This will get slower if the dictionary input_b gets too large (too many rows are being updated, too many iterations in the for loop), but should be relatively fast with small input_b's even with large test dataframes.



            This answer also assumes the keys in the input_b dictionary refer to the values of the B column in the original dataframe, and will add repeated values in the C and D columns for repeated values in the B column.






            share|improve this answer




























              1












              1








              1







              This may not be the most efficient solution, but from what I understand it got the job done:



              import pandas as pd
              import numpy as np

              test = pd.DataFrame({'A': ['a', 'b', 'b', 'b'], 'B': [1, 2, 3, 4],
              'C': [np.nan, np.nan, np.nan, np.nan],
              'D': [np.nan, np.nan, np.nan, np.nan]})


              input_b = {2: ['Moon', 'Elephant'], 4: ['Sun', 'Mouse']}


              for key, value in input_b.items():
              test.loc[test['B'] == key, ['C', 'D']] = value

              print(test)


              Yields:



                 A  B     C         D
              0 a 1 NaN NaN
              1 b 2 Moon Elephant
              2 b 3 NaN NaN
              3 b 4 Sun Mouse


              This will get slower if the dictionary input_b gets too large (too many rows are being updated, too many iterations in the for loop), but should be relatively fast with small input_b's even with large test dataframes.



              This answer also assumes the keys in the input_b dictionary refer to the values of the B column in the original dataframe, and will add repeated values in the C and D columns for repeated values in the B column.






              share|improve this answer















              This may not be the most efficient solution, but from what I understand it got the job done:



              import pandas as pd
              import numpy as np

              test = pd.DataFrame({'A': ['a', 'b', 'b', 'b'], 'B': [1, 2, 3, 4],
              'C': [np.nan, np.nan, np.nan, np.nan],
              'D': [np.nan, np.nan, np.nan, np.nan]})


              input_b = {2: ['Moon', 'Elephant'], 4: ['Sun', 'Mouse']}


              for key, value in input_b.items():
              test.loc[test['B'] == key, ['C', 'D']] = value

              print(test)


              Yields:



                 A  B     C         D
              0 a 1 NaN NaN
              1 b 2 Moon Elephant
              2 b 3 NaN NaN
              3 b 4 Sun Mouse


              This will get slower if the dictionary input_b gets too large (too many rows are being updated, too many iterations in the for loop), but should be relatively fast with small input_b's even with large test dataframes.



              This answer also assumes the keys in the input_b dictionary refer to the values of the B column in the original dataframe, and will add repeated values in the C and D columns for repeated values in the B column.







              share|improve this answer














              share|improve this answer



              share|improve this answer








              edited Jan 3 at 0:15

























              answered Jan 3 at 0:00









              PasaPasa

              35238




              35238

























                  1














                  Using update



                  test=test.set_index('B')
                  test.update(pd.DataFrame(input_b,index=['C','D']).T)
                  test=test.reset_index()
                  test
                  B A C D
                  0 1 a NaN NaN
                  1 2 b Moon Elephant
                  2 3 b NaN NaN
                  3 4 b Sun Mouse





                  share|improve this answer




























                    1














                    Using update



                    test=test.set_index('B')
                    test.update(pd.DataFrame(input_b,index=['C','D']).T)
                    test=test.reset_index()
                    test
                    B A C D
                    0 1 a NaN NaN
                    1 2 b Moon Elephant
                    2 3 b NaN NaN
                    3 4 b Sun Mouse





                    share|improve this answer


























                      1












                      1








                      1







                      Using update



                      test=test.set_index('B')
                      test.update(pd.DataFrame(input_b,index=['C','D']).T)
                      test=test.reset_index()
                      test
                      B A C D
                      0 1 a NaN NaN
                      1 2 b Moon Elephant
                      2 3 b NaN NaN
                      3 4 b Sun Mouse





                      share|improve this answer













                      Using update



                      test=test.set_index('B')
                      test.update(pd.DataFrame(input_b,index=['C','D']).T)
                      test=test.reset_index()
                      test
                      B A C D
                      0 1 a NaN NaN
                      1 2 b Moon Elephant
                      2 3 b NaN NaN
                      3 4 b Sun Mouse






                      share|improve this answer












                      share|improve this answer



                      share|improve this answer










                      answered Jan 3 at 0:42









                      Wen-BenWen-Ben

                      122k83671




                      122k83671























                          1














                          You can use loc indexing after setting your index to B:



                          test = test.set_index('B')
                          test.loc[input_b, ['C', 'D']] = list(input_b.values())
                          test = test.reset_index()

                          print(test)

                          B A C D
                          0 1 a NaN NaN
                          1 2 b Moon Elephant
                          2 3 b NaN NaN
                          3 4 b Sun Mouse





                          share|improve this answer




























                            1














                            You can use loc indexing after setting your index to B:



                            test = test.set_index('B')
                            test.loc[input_b, ['C', 'D']] = list(input_b.values())
                            test = test.reset_index()

                            print(test)

                            B A C D
                            0 1 a NaN NaN
                            1 2 b Moon Elephant
                            2 3 b NaN NaN
                            3 4 b Sun Mouse





                            share|improve this answer


























                              1












                              1








                              1







                              You can use loc indexing after setting your index to B:



                              test = test.set_index('B')
                              test.loc[input_b, ['C', 'D']] = list(input_b.values())
                              test = test.reset_index()

                              print(test)

                              B A C D
                              0 1 a NaN NaN
                              1 2 b Moon Elephant
                              2 3 b NaN NaN
                              3 4 b Sun Mouse





                              share|improve this answer













                              You can use loc indexing after setting your index to B:



                              test = test.set_index('B')
                              test.loc[input_b, ['C', 'D']] = list(input_b.values())
                              test = test.reset_index()

                              print(test)

                              B A C D
                              0 1 a NaN NaN
                              1 2 b Moon Elephant
                              2 3 b NaN NaN
                              3 4 b Sun Mouse






                              share|improve this answer












                              share|improve this answer



                              share|improve this answer










                              answered Jan 3 at 0:54









                              jppjpp

                              102k2166116




                              102k2166116























                                  1














                                  Using apply



                                  test['C'] = test['B'].map(input_b).apply(lambda x: x[0] if type(x)==list else x)
                                  test['D'] = test['B'].map(input_b).apply(lambda x: x[1] if type(x)==list else x)


                                  yields



                                     A  B     C         D
                                  0 a 1 NaN NaN
                                  1 b 2 Moon Elephant
                                  2 b 3 NaN NaN
                                  3 b 4 Sun Mouse





                                  share|improve this answer




























                                    1














                                    Using apply



                                    test['C'] = test['B'].map(input_b).apply(lambda x: x[0] if type(x)==list else x)
                                    test['D'] = test['B'].map(input_b).apply(lambda x: x[1] if type(x)==list else x)


                                    yields



                                       A  B     C         D
                                    0 a 1 NaN NaN
                                    1 b 2 Moon Elephant
                                    2 b 3 NaN NaN
                                    3 b 4 Sun Mouse





                                    share|improve this answer


























                                      1












                                      1








                                      1







                                      Using apply



                                      test['C'] = test['B'].map(input_b).apply(lambda x: x[0] if type(x)==list else x)
                                      test['D'] = test['B'].map(input_b).apply(lambda x: x[1] if type(x)==list else x)


                                      yields



                                         A  B     C         D
                                      0 a 1 NaN NaN
                                      1 b 2 Moon Elephant
                                      2 b 3 NaN NaN
                                      3 b 4 Sun Mouse





                                      share|improve this answer













                                      Using apply



                                      test['C'] = test['B'].map(input_b).apply(lambda x: x[0] if type(x)==list else x)
                                      test['D'] = test['B'].map(input_b).apply(lambda x: x[1] if type(x)==list else x)


                                      yields



                                         A  B     C         D
                                      0 a 1 NaN NaN
                                      1 b 2 Moon Elephant
                                      2 b 3 NaN NaN
                                      3 b 4 Sun Mouse






                                      share|improve this answer












                                      share|improve this answer



                                      share|improve this answer










                                      answered Jan 3 at 1:14









                                      raninjanraninjan

                                      1315




                                      1315






























                                          draft saved

                                          draft discarded




















































                                          Thanks for contributing an answer to Stack Overflow!


                                          • Please be sure to answer the question. Provide details and share your research!

                                          But avoid



                                          • Asking for help, clarification, or responding to other answers.

                                          • Making statements based on opinion; back them up with references or personal experience.


                                          To learn more, see our tips on writing great answers.




                                          draft saved


                                          draft discarded














                                          StackExchange.ready(
                                          function () {
                                          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54014687%2fpopulate-pandas-dataframe-using-a-dictionary-based-on-a-condition%23new-answer', 'question_page');
                                          }
                                          );

                                          Post as a guest















                                          Required, but never shown





















































                                          Required, but never shown














                                          Required, but never shown












                                          Required, but never shown







                                          Required, but never shown

































                                          Required, but never shown














                                          Required, but never shown












                                          Required, but never shown







                                          Required, but never shown







                                          Popular posts from this blog

                                          MongoDB - Not Authorized To Execute Command

                                          Npm cannot find a required file even through it is in the searched directory

                                          in spring boot 2.1 many test slices are not allowed anymore due to multiple @BootstrapWith