Compare dataframes based on row values in Pandas












0















What is the best way to compare two datasets (.csv files) using Pandas where there is no 1:1 cardinality between the data?



For example: Here's a sample from dataset one -



#### Row Item Color Price
01 Shirt Red $30
02 Hat Blue $10


And a sample from dataframe two -



#### Row Item Color Price
01 Trouser Black $20
02 Bag Yellow $ 30
03 Hat Blue $10
04 Shirt Red $30


So if I wanted to compare all rows in both datasets where there is a Shirt row.



what is the best way to do it?



I'm using Pandas/Python3.7



Thanks!










share|improve this question




















  • 1





    What is the expected output?

    – Mayank Porwal
    Nov 20 '18 at 14:26











  • The expected output should be - Shirt Red $30 matching with Shirt Red $30 from both dataframes and Hat Blue $10 matching with Hat Blue $10 from both dataframes. So basically, if there is a row values called Shirt in both dataframes, then all columns should be compared for those rows.

    – Greedy Coder
    Nov 20 '18 at 14:27


















0















What is the best way to compare two datasets (.csv files) using Pandas where there is no 1:1 cardinality between the data?



For example: Here's a sample from dataset one -



#### Row Item Color Price
01 Shirt Red $30
02 Hat Blue $10


And a sample from dataframe two -



#### Row Item Color Price
01 Trouser Black $20
02 Bag Yellow $ 30
03 Hat Blue $10
04 Shirt Red $30


So if I wanted to compare all rows in both datasets where there is a Shirt row.



what is the best way to do it?



I'm using Pandas/Python3.7



Thanks!










share|improve this question




















  • 1





    What is the expected output?

    – Mayank Porwal
    Nov 20 '18 at 14:26











  • The expected output should be - Shirt Red $30 matching with Shirt Red $30 from both dataframes and Hat Blue $10 matching with Hat Blue $10 from both dataframes. So basically, if there is a row values called Shirt in both dataframes, then all columns should be compared for those rows.

    – Greedy Coder
    Nov 20 '18 at 14:27
















0












0








0








What is the best way to compare two datasets (.csv files) using Pandas where there is no 1:1 cardinality between the data?



For example: Here's a sample from dataset one -



#### Row Item Color Price
01 Shirt Red $30
02 Hat Blue $10


And a sample from dataframe two -



#### Row Item Color Price
01 Trouser Black $20
02 Bag Yellow $ 30
03 Hat Blue $10
04 Shirt Red $30


So if I wanted to compare all rows in both datasets where there is a Shirt row.



what is the best way to do it?



I'm using Pandas/Python3.7



Thanks!










share|improve this question
















What is the best way to compare two datasets (.csv files) using Pandas where there is no 1:1 cardinality between the data?



For example: Here's a sample from dataset one -



#### Row Item Color Price
01 Shirt Red $30
02 Hat Blue $10


And a sample from dataframe two -



#### Row Item Color Price
01 Trouser Black $20
02 Bag Yellow $ 30
03 Hat Blue $10
04 Shirt Red $30


So if I wanted to compare all rows in both datasets where there is a Shirt row.



what is the best way to do it?



I'm using Pandas/Python3.7



Thanks!







python python-3.x pandas dataframe data-analysis






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Dec 19 '18 at 6:34









Mayank Porwal

4,9072724




4,9072724










asked Nov 20 '18 at 14:23









Greedy CoderGreedy Coder

6118




6118








  • 1





    What is the expected output?

    – Mayank Porwal
    Nov 20 '18 at 14:26











  • The expected output should be - Shirt Red $30 matching with Shirt Red $30 from both dataframes and Hat Blue $10 matching with Hat Blue $10 from both dataframes. So basically, if there is a row values called Shirt in both dataframes, then all columns should be compared for those rows.

    – Greedy Coder
    Nov 20 '18 at 14:27
















  • 1





    What is the expected output?

    – Mayank Porwal
    Nov 20 '18 at 14:26











  • The expected output should be - Shirt Red $30 matching with Shirt Red $30 from both dataframes and Hat Blue $10 matching with Hat Blue $10 from both dataframes. So basically, if there is a row values called Shirt in both dataframes, then all columns should be compared for those rows.

    – Greedy Coder
    Nov 20 '18 at 14:27










1




1





What is the expected output?

– Mayank Porwal
Nov 20 '18 at 14:26





What is the expected output?

– Mayank Porwal
Nov 20 '18 at 14:26













The expected output should be - Shirt Red $30 matching with Shirt Red $30 from both dataframes and Hat Blue $10 matching with Hat Blue $10 from both dataframes. So basically, if there is a row values called Shirt in both dataframes, then all columns should be compared for those rows.

– Greedy Coder
Nov 20 '18 at 14:27







The expected output should be - Shirt Red $30 matching with Shirt Red $30 from both dataframes and Hat Blue $10 matching with Hat Blue $10 from both dataframes. So basically, if there is a row values called Shirt in both dataframes, then all columns should be compared for those rows.

– Greedy Coder
Nov 20 '18 at 14:27














3 Answers
3






active

oldest

votes


















1














check this example:



DF1 = pd.DataFrame(data={'c1':['abc','abc','iop','iop'],'c2':['xyz','mno','yut','trg'],'c3':[0,0,0,0]})

c1 c2 c3
0 abc xyz 0
1 abc mno 0
2 iop yut 0
3 iop trg 0


DF2 = pd.DataFrame(data={'c1':['iop','abc','bhj','iop','xdf'],'c2':['yut','mno','uio','yut','edc']})
c1 c2
0 iop yut
1 abc mno
2 bhj uio
3 iop yut
4 xdf edc
match = pd.merge(DF1,DF2,on=['c1','c2'],how='inner')
print(match)

c1 c2 c3
0 abc mno 0
1 iop yut 0
2 iop yut 0


So by using pd.merge you can get the matching rows






share|improve this answer
























  • So I don't know where the row value Shirt is present in both datasets. Can I still use inner merge?

    – Greedy Coder
    Nov 20 '18 at 14:30






  • 1





    That's awesome! I'm going to try this and update. Using multiple columns as a list also helps, because I have a set of 'key' columns that I want to compare if the row element is found to be the same. :)

    – Greedy Coder
    Nov 20 '18 at 14:33











  • sure, you can check the docs so as to handle the suffixes given by merge function. you can also drop them if unwanted :)

    – anky_91
    Nov 20 '18 at 14:38













  • @GreedyCoder : if any of the answers helped, please dont forget to accept one of the answers so people can refer this question later and find their solution. :) thanks

    – anky_91
    Nov 27 '18 at 13:03



















1














You can simply use pandas merge like this:



pd.merge(df1[df1.Item == 'Shirt'], df2[df2.Item == 'Shirt'], on=['Item','Color', 'Price')


This will produce an output only when all columns in both dataframes match for Item='Shirt' and corresponding columns are also equal.



Let me know if this is what you want.






share|improve this answer































    1














    Filter after merge



    df1.merge(df2,on='Item').loc[lambda x : x.Item=='Shirt']
    Out[89]:
    Row_x Item Color_x Price_x Row_y Color_y Price_y
    0 1 Shirt Red $30 4 Red $30





    share|improve this answer























      Your Answer






      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "1"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53395097%2fcompare-dataframes-based-on-row-values-in-pandas%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      3 Answers
      3






      active

      oldest

      votes








      3 Answers
      3






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      1














      check this example:



      DF1 = pd.DataFrame(data={'c1':['abc','abc','iop','iop'],'c2':['xyz','mno','yut','trg'],'c3':[0,0,0,0]})

      c1 c2 c3
      0 abc xyz 0
      1 abc mno 0
      2 iop yut 0
      3 iop trg 0


      DF2 = pd.DataFrame(data={'c1':['iop','abc','bhj','iop','xdf'],'c2':['yut','mno','uio','yut','edc']})
      c1 c2
      0 iop yut
      1 abc mno
      2 bhj uio
      3 iop yut
      4 xdf edc
      match = pd.merge(DF1,DF2,on=['c1','c2'],how='inner')
      print(match)

      c1 c2 c3
      0 abc mno 0
      1 iop yut 0
      2 iop yut 0


      So by using pd.merge you can get the matching rows






      share|improve this answer
























      • So I don't know where the row value Shirt is present in both datasets. Can I still use inner merge?

        – Greedy Coder
        Nov 20 '18 at 14:30






      • 1





        That's awesome! I'm going to try this and update. Using multiple columns as a list also helps, because I have a set of 'key' columns that I want to compare if the row element is found to be the same. :)

        – Greedy Coder
        Nov 20 '18 at 14:33











      • sure, you can check the docs so as to handle the suffixes given by merge function. you can also drop them if unwanted :)

        – anky_91
        Nov 20 '18 at 14:38













      • @GreedyCoder : if any of the answers helped, please dont forget to accept one of the answers so people can refer this question later and find their solution. :) thanks

        – anky_91
        Nov 27 '18 at 13:03
















      1














      check this example:



      DF1 = pd.DataFrame(data={'c1':['abc','abc','iop','iop'],'c2':['xyz','mno','yut','trg'],'c3':[0,0,0,0]})

      c1 c2 c3
      0 abc xyz 0
      1 abc mno 0
      2 iop yut 0
      3 iop trg 0


      DF2 = pd.DataFrame(data={'c1':['iop','abc','bhj','iop','xdf'],'c2':['yut','mno','uio','yut','edc']})
      c1 c2
      0 iop yut
      1 abc mno
      2 bhj uio
      3 iop yut
      4 xdf edc
      match = pd.merge(DF1,DF2,on=['c1','c2'],how='inner')
      print(match)

      c1 c2 c3
      0 abc mno 0
      1 iop yut 0
      2 iop yut 0


      So by using pd.merge you can get the matching rows






      share|improve this answer
























      • So I don't know where the row value Shirt is present in both datasets. Can I still use inner merge?

        – Greedy Coder
        Nov 20 '18 at 14:30






      • 1





        That's awesome! I'm going to try this and update. Using multiple columns as a list also helps, because I have a set of 'key' columns that I want to compare if the row element is found to be the same. :)

        – Greedy Coder
        Nov 20 '18 at 14:33











      • sure, you can check the docs so as to handle the suffixes given by merge function. you can also drop them if unwanted :)

        – anky_91
        Nov 20 '18 at 14:38













      • @GreedyCoder : if any of the answers helped, please dont forget to accept one of the answers so people can refer this question later and find their solution. :) thanks

        – anky_91
        Nov 27 '18 at 13:03














      1












      1








      1







      check this example:



      DF1 = pd.DataFrame(data={'c1':['abc','abc','iop','iop'],'c2':['xyz','mno','yut','trg'],'c3':[0,0,0,0]})

      c1 c2 c3
      0 abc xyz 0
      1 abc mno 0
      2 iop yut 0
      3 iop trg 0


      DF2 = pd.DataFrame(data={'c1':['iop','abc','bhj','iop','xdf'],'c2':['yut','mno','uio','yut','edc']})
      c1 c2
      0 iop yut
      1 abc mno
      2 bhj uio
      3 iop yut
      4 xdf edc
      match = pd.merge(DF1,DF2,on=['c1','c2'],how='inner')
      print(match)

      c1 c2 c3
      0 abc mno 0
      1 iop yut 0
      2 iop yut 0


      So by using pd.merge you can get the matching rows






      share|improve this answer













      check this example:



      DF1 = pd.DataFrame(data={'c1':['abc','abc','iop','iop'],'c2':['xyz','mno','yut','trg'],'c3':[0,0,0,0]})

      c1 c2 c3
      0 abc xyz 0
      1 abc mno 0
      2 iop yut 0
      3 iop trg 0


      DF2 = pd.DataFrame(data={'c1':['iop','abc','bhj','iop','xdf'],'c2':['yut','mno','uio','yut','edc']})
      c1 c2
      0 iop yut
      1 abc mno
      2 bhj uio
      3 iop yut
      4 xdf edc
      match = pd.merge(DF1,DF2,on=['c1','c2'],how='inner')
      print(match)

      c1 c2 c3
      0 abc mno 0
      1 iop yut 0
      2 iop yut 0


      So by using pd.merge you can get the matching rows







      share|improve this answer












      share|improve this answer



      share|improve this answer










      answered Nov 20 '18 at 14:26









      anky_91anky_91

      2,8032318




      2,8032318













      • So I don't know where the row value Shirt is present in both datasets. Can I still use inner merge?

        – Greedy Coder
        Nov 20 '18 at 14:30






      • 1





        That's awesome! I'm going to try this and update. Using multiple columns as a list also helps, because I have a set of 'key' columns that I want to compare if the row element is found to be the same. :)

        – Greedy Coder
        Nov 20 '18 at 14:33











      • sure, you can check the docs so as to handle the suffixes given by merge function. you can also drop them if unwanted :)

        – anky_91
        Nov 20 '18 at 14:38













      • @GreedyCoder : if any of the answers helped, please dont forget to accept one of the answers so people can refer this question later and find their solution. :) thanks

        – anky_91
        Nov 27 '18 at 13:03



















      • So I don't know where the row value Shirt is present in both datasets. Can I still use inner merge?

        – Greedy Coder
        Nov 20 '18 at 14:30






      • 1





        That's awesome! I'm going to try this and update. Using multiple columns as a list also helps, because I have a set of 'key' columns that I want to compare if the row element is found to be the same. :)

        – Greedy Coder
        Nov 20 '18 at 14:33











      • sure, you can check the docs so as to handle the suffixes given by merge function. you can also drop them if unwanted :)

        – anky_91
        Nov 20 '18 at 14:38













      • @GreedyCoder : if any of the answers helped, please dont forget to accept one of the answers so people can refer this question later and find their solution. :) thanks

        – anky_91
        Nov 27 '18 at 13:03

















      So I don't know where the row value Shirt is present in both datasets. Can I still use inner merge?

      – Greedy Coder
      Nov 20 '18 at 14:30





      So I don't know where the row value Shirt is present in both datasets. Can I still use inner merge?

      – Greedy Coder
      Nov 20 '18 at 14:30




      1




      1





      That's awesome! I'm going to try this and update. Using multiple columns as a list also helps, because I have a set of 'key' columns that I want to compare if the row element is found to be the same. :)

      – Greedy Coder
      Nov 20 '18 at 14:33





      That's awesome! I'm going to try this and update. Using multiple columns as a list also helps, because I have a set of 'key' columns that I want to compare if the row element is found to be the same. :)

      – Greedy Coder
      Nov 20 '18 at 14:33













      sure, you can check the docs so as to handle the suffixes given by merge function. you can also drop them if unwanted :)

      – anky_91
      Nov 20 '18 at 14:38







      sure, you can check the docs so as to handle the suffixes given by merge function. you can also drop them if unwanted :)

      – anky_91
      Nov 20 '18 at 14:38















      @GreedyCoder : if any of the answers helped, please dont forget to accept one of the answers so people can refer this question later and find their solution. :) thanks

      – anky_91
      Nov 27 '18 at 13:03





      @GreedyCoder : if any of the answers helped, please dont forget to accept one of the answers so people can refer this question later and find their solution. :) thanks

      – anky_91
      Nov 27 '18 at 13:03













      1














      You can simply use pandas merge like this:



      pd.merge(df1[df1.Item == 'Shirt'], df2[df2.Item == 'Shirt'], on=['Item','Color', 'Price')


      This will produce an output only when all columns in both dataframes match for Item='Shirt' and corresponding columns are also equal.



      Let me know if this is what you want.






      share|improve this answer




























        1














        You can simply use pandas merge like this:



        pd.merge(df1[df1.Item == 'Shirt'], df2[df2.Item == 'Shirt'], on=['Item','Color', 'Price')


        This will produce an output only when all columns in both dataframes match for Item='Shirt' and corresponding columns are also equal.



        Let me know if this is what you want.






        share|improve this answer


























          1












          1








          1







          You can simply use pandas merge like this:



          pd.merge(df1[df1.Item == 'Shirt'], df2[df2.Item == 'Shirt'], on=['Item','Color', 'Price')


          This will produce an output only when all columns in both dataframes match for Item='Shirt' and corresponding columns are also equal.



          Let me know if this is what you want.






          share|improve this answer













          You can simply use pandas merge like this:



          pd.merge(df1[df1.Item == 'Shirt'], df2[df2.Item == 'Shirt'], on=['Item','Color', 'Price')


          This will produce an output only when all columns in both dataframes match for Item='Shirt' and corresponding columns are also equal.



          Let me know if this is what you want.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 20 '18 at 14:28









          Mayank PorwalMayank Porwal

          4,9072724




          4,9072724























              1














              Filter after merge



              df1.merge(df2,on='Item').loc[lambda x : x.Item=='Shirt']
              Out[89]:
              Row_x Item Color_x Price_x Row_y Color_y Price_y
              0 1 Shirt Red $30 4 Red $30





              share|improve this answer




























                1














                Filter after merge



                df1.merge(df2,on='Item').loc[lambda x : x.Item=='Shirt']
                Out[89]:
                Row_x Item Color_x Price_x Row_y Color_y Price_y
                0 1 Shirt Red $30 4 Red $30





                share|improve this answer


























                  1












                  1








                  1







                  Filter after merge



                  df1.merge(df2,on='Item').loc[lambda x : x.Item=='Shirt']
                  Out[89]:
                  Row_x Item Color_x Price_x Row_y Color_y Price_y
                  0 1 Shirt Red $30 4 Red $30





                  share|improve this answer













                  Filter after merge



                  df1.merge(df2,on='Item').loc[lambda x : x.Item=='Shirt']
                  Out[89]:
                  Row_x Item Color_x Price_x Row_y Color_y Price_y
                  0 1 Shirt Red $30 4 Red $30






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Nov 20 '18 at 14:30









                  W-BW-B

                  107k83165




                  107k83165






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53395097%2fcompare-dataframes-based-on-row-values-in-pandas%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      MongoDB - Not Authorized To Execute Command

                      How to fix TextFormField cause rebuild widget in Flutter

                      Npm cannot find a required file even through it is in the searched directory