Compare dataframes based on row values in Pandas

What is the best way to compare two datasets (.csv files) using Pandas where there is no 1:1 cardinality between the data?

For example: Here's a sample from dataset one -

#### Row Item Color Price

01 Shirt Red $30

02 Hat Blue $10

And a sample from dataframe two -

#### Row Item Color Price

01 Trouser Black $20

02 Bag Yellow $ 30

03 Hat Blue $10

04 Shirt Red $30

So if I wanted to compare all rows in both datasets where there is a Shirt row.

what is the best way to do it?

I'm using Pandas/Python3.7

Thanks!

edited Dec 19 '18 at 6:34

Mayank Porwal

4,9072724

asked Nov 20 '18 at 14:23

Greedy Coder

6118

1

What is the expected output?

– Mayank Porwal
Nov 20 '18 at 14:26

The expected output should be - Shirt Red $30 matching with Shirt Red $30 from both dataframes and Hat Blue $10 matching with Hat Blue $10 from both dataframes. So basically, if there is a row values called Shirt in both dataframes, then all columns should be compared for those rows.

– Greedy Coder
Nov 20 '18 at 14:27

add a comment |

What is the best way to compare two datasets (.csv files) using Pandas where there is no 1:1 cardinality between the data?

For example: Here's a sample from dataset one -

#### Row Item Color Price

01 Shirt Red $30

02 Hat Blue $10

And a sample from dataframe two -

#### Row Item Color Price

01 Trouser Black $20

02 Bag Yellow $ 30

03 Hat Blue $10

04 Shirt Red $30

So if I wanted to compare all rows in both datasets where there is a Shirt row.

what is the best way to do it?

I'm using Pandas/Python3.7

Thanks!

edited Dec 19 '18 at 6:34

Mayank Porwal

4,9072724

asked Nov 20 '18 at 14:23

Greedy Coder

6118

1

What is the expected output?

– Mayank Porwal
Nov 20 '18 at 14:26

The expected output should be - Shirt Red $30 matching with Shirt Red $30 from both dataframes and Hat Blue $10 matching with Hat Blue $10 from both dataframes. So basically, if there is a row values called Shirt in both dataframes, then all columns should be compared for those rows.

– Greedy Coder
Nov 20 '18 at 14:27

add a comment |

What is the best way to compare two datasets (.csv files) using Pandas where there is no 1:1 cardinality between the data?

For example: Here's a sample from dataset one -

#### Row Item Color Price

01 Shirt Red $30

02 Hat Blue $10

And a sample from dataframe two -

#### Row Item Color Price

01 Trouser Black $20

02 Bag Yellow $ 30

03 Hat Blue $10

04 Shirt Red $30

So if I wanted to compare all rows in both datasets where there is a Shirt row.

what is the best way to do it?

I'm using Pandas/Python3.7

Thanks!

edited Dec 19 '18 at 6:34

Mayank Porwal

4,9072724

asked Nov 20 '18 at 14:23

Greedy Coder

6118

What is the best way to compare two datasets (.csv files) using Pandas where there is no 1:1 cardinality between the data?

For example: Here's a sample from dataset one -

#### Row Item Color Price

01 Shirt Red $30

02 Hat Blue $10

And a sample from dataframe two -

#### Row Item Color Price

01 Trouser Black $20

02 Bag Yellow $ 30

03 Hat Blue $10

04 Shirt Red $30

So if I wanted to compare all rows in both datasets where there is a Shirt row.

what is the best way to do it?

I'm using Pandas/Python3.7

Thanks!

python python-3.x pandas dataframe data-analysis

edited Dec 19 '18 at 6:34

Mayank Porwal

4,9072724

asked Nov 20 '18 at 14:23

Greedy Coder

6118

edited Dec 19 '18 at 6:34

Mayank Porwal

4,9072724

asked Nov 20 '18 at 14:23

Greedy Coder

6118

edited Dec 19 '18 at 6:34

Mayank Porwal

4,9072724

edited Dec 19 '18 at 6:34

Mayank Porwal

4,9072724

edited Dec 19 '18 at 6:34

Mayank Porwal

4,9072724

asked Nov 20 '18 at 14:23

Greedy Coder

6118

asked Nov 20 '18 at 14:23

Greedy Coder

6118

asked Nov 20 '18 at 14:23

Greedy Coder

6118

1

What is the expected output?

– Mayank Porwal
Nov 20 '18 at 14:26

The expected output should be - Shirt Red $30 matching with Shirt Red $30 from both dataframes and Hat Blue $10 matching with Hat Blue $10 from both dataframes. So basically, if there is a row values called Shirt in both dataframes, then all columns should be compared for those rows.

– Greedy Coder
Nov 20 '18 at 14:27

add a comment |

1

What is the expected output?

– Mayank Porwal
Nov 20 '18 at 14:26

The expected output should be - Shirt Red $30 matching with Shirt Red $30 from both dataframes and Hat Blue $10 matching with Hat Blue $10 from both dataframes. So basically, if there is a row values called Shirt in both dataframes, then all columns should be compared for those rows.

– Greedy Coder
Nov 20 '18 at 14:27

What is the expected output?

– Mayank Porwal
Nov 20 '18 at 14:26

The expected output should be - Shirt Red $30 matching with Shirt Red $30 from both dataframes and Hat Blue $10 matching with Hat Blue $10 from both dataframes. So basically, if there is a row values called Shirt in both dataframes, then all columns should be compared for those rows.

– Greedy Coder
Nov 20 '18 at 14:27

add a comment |

3 Answers
3

active

oldest

votes

check this example:

DF1 = pd.DataFrame(data={'c1':['abc','abc','iop','iop'],'c2':['xyz','mno','yut','trg'],'c3':[0,0,0,0]})



    c1  c2  c3

0   abc xyz 0

1   abc mno 0

2   iop yut 0

3   iop trg 0





DF2 = pd.DataFrame(data={'c1':['iop','abc','bhj','iop','xdf'],'c2':['yut','mno','uio','yut','edc']})

    c1  c2

0   iop yut

1   abc mno

2   bhj uio

3   iop yut

4   xdf edc

match = pd.merge(DF1,DF2,on=['c1','c2'],how='inner')

print(match)



    c1  c2  c3

0   abc mno 0

1   iop yut 0

2   iop yut 0

So by using pd.merge you can get the matching rows

answered Nov 20 '18 at 14:26

anky_91

2,8032318

So I don't know where the row value Shirt is present in both datasets. Can I still use inner merge?

– Greedy Coder
Nov 20 '18 at 14:30

1

That's awesome! I'm going to try this and update. Using multiple columns as a list also helps, because I have a set of 'key' columns that I want to compare if the row element is found to be the same. :)

– Greedy Coder
Nov 20 '18 at 14:33

sure, you can check the docs so as to handle the suffixes given by merge function. you can also drop them if unwanted :)

– anky_91
Nov 20 '18 at 14:38

@GreedyCoder : if any of the answers helped, please dont forget to accept one of the answers so people can refer this question later and find their solution. :) thanks

– anky_91
Nov 27 '18 at 13:03

add a comment |

You can simply use pandas merge like this:

pd.merge(df1[df1.Item == 'Shirt'], df2[df2.Item == 'Shirt'], on=['Item','Color', 'Price')

This will produce an output only when all columns in both dataframes match for Item='Shirt' and corresponding columns are also equal.

Let me know if this is what you want.

answered Nov 20 '18 at 14:28

Mayank Porwal

4,9072724

add a comment |

Filter after merge

df1.merge(df2,on='Item').loc[lambda x : x.Item=='Shirt']

Out[89]: 

   Row_x   Item Color_x Price_x  Row_y Color_y Price_y

0      1  Shirt     Red     $30      4     Red     $30

answered Nov 20 '18 at 14:30

W-B

107k83165

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53395097%2fcompare-dataframes-based-on-row-values-in-pandas%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

check this example:

DF1 = pd.DataFrame(data={'c1':['abc','abc','iop','iop'],'c2':['xyz','mno','yut','trg'],'c3':[0,0,0,0]})



    c1  c2  c3

0   abc xyz 0

1   abc mno 0

2   iop yut 0

3   iop trg 0





DF2 = pd.DataFrame(data={'c1':['iop','abc','bhj','iop','xdf'],'c2':['yut','mno','uio','yut','edc']})

    c1  c2

0   iop yut

1   abc mno

2   bhj uio

3   iop yut

4   xdf edc

match = pd.merge(DF1,DF2,on=['c1','c2'],how='inner')

print(match)



    c1  c2  c3

0   abc mno 0

1   iop yut 0

2   iop yut 0

So by using pd.merge you can get the matching rows

answered Nov 20 '18 at 14:26

anky_91

2,8032318

So I don't know where the row value Shirt is present in both datasets. Can I still use inner merge?

– Greedy Coder
Nov 20 '18 at 14:30

1

That's awesome! I'm going to try this and update. Using multiple columns as a list also helps, because I have a set of 'key' columns that I want to compare if the row element is found to be the same. :)

– Greedy Coder
Nov 20 '18 at 14:33

sure, you can check the docs so as to handle the suffixes given by merge function. you can also drop them if unwanted :)

– anky_91
Nov 20 '18 at 14:38

@GreedyCoder : if any of the answers helped, please dont forget to accept one of the answers so people can refer this question later and find their solution. :) thanks

– anky_91
Nov 27 '18 at 13:03

add a comment |

check this example:

DF1 = pd.DataFrame(data={'c1':['abc','abc','iop','iop'],'c2':['xyz','mno','yut','trg'],'c3':[0,0,0,0]})



    c1  c2  c3

0   abc xyz 0

1   abc mno 0

2   iop yut 0

3   iop trg 0





DF2 = pd.DataFrame(data={'c1':['iop','abc','bhj','iop','xdf'],'c2':['yut','mno','uio','yut','edc']})

    c1  c2

0   iop yut

1   abc mno

2   bhj uio

3   iop yut

4   xdf edc

match = pd.merge(DF1,DF2,on=['c1','c2'],how='inner')

print(match)



    c1  c2  c3

0   abc mno 0

1   iop yut 0

2   iop yut 0

So by using pd.merge you can get the matching rows

answered Nov 20 '18 at 14:26

anky_91

2,8032318

So I don't know where the row value Shirt is present in both datasets. Can I still use inner merge?

– Greedy Coder
Nov 20 '18 at 14:30

1

That's awesome! I'm going to try this and update. Using multiple columns as a list also helps, because I have a set of 'key' columns that I want to compare if the row element is found to be the same. :)

– Greedy Coder
Nov 20 '18 at 14:33

sure, you can check the docs so as to handle the suffixes given by merge function. you can also drop them if unwanted :)

– anky_91
Nov 20 '18 at 14:38

@GreedyCoder : if any of the answers helped, please dont forget to accept one of the answers so people can refer this question later and find their solution. :) thanks

– anky_91
Nov 27 '18 at 13:03

add a comment |

check this example:

DF1 = pd.DataFrame(data={'c1':['abc','abc','iop','iop'],'c2':['xyz','mno','yut','trg'],'c3':[0,0,0,0]})



    c1  c2  c3

0   abc xyz 0

1   abc mno 0

2   iop yut 0

3   iop trg 0





DF2 = pd.DataFrame(data={'c1':['iop','abc','bhj','iop','xdf'],'c2':['yut','mno','uio','yut','edc']})

    c1  c2

0   iop yut

1   abc mno

2   bhj uio

3   iop yut

4   xdf edc

match = pd.merge(DF1,DF2,on=['c1','c2'],how='inner')

print(match)



    c1  c2  c3

0   abc mno 0

1   iop yut 0

2   iop yut 0

So by using pd.merge you can get the matching rows

answered Nov 20 '18 at 14:26

anky_91

2,8032318

check this example:

DF1 = pd.DataFrame(data={'c1':['abc','abc','iop','iop'],'c2':['xyz','mno','yut','trg'],'c3':[0,0,0,0]})



    c1  c2  c3

0   abc xyz 0

1   abc mno 0

2   iop yut 0

3   iop trg 0





DF2 = pd.DataFrame(data={'c1':['iop','abc','bhj','iop','xdf'],'c2':['yut','mno','uio','yut','edc']})

    c1  c2

0   iop yut

1   abc mno

2   bhj uio

3   iop yut

4   xdf edc

match = pd.merge(DF1,DF2,on=['c1','c2'],how='inner')

print(match)



    c1  c2  c3

0   abc mno 0

1   iop yut 0

2   iop yut 0

So by using pd.merge you can get the matching rows

answered Nov 20 '18 at 14:26

anky_91

2,8032318

answered Nov 20 '18 at 14:26

anky_91

2,8032318

answered Nov 20 '18 at 14:26

anky_91

2,8032318

answered Nov 20 '18 at 14:26

anky_91

2,8032318

So I don't know where the row value Shirt is present in both datasets. Can I still use inner merge?

– Greedy Coder
Nov 20 '18 at 14:30

1

That's awesome! I'm going to try this and update. Using multiple columns as a list also helps, because I have a set of 'key' columns that I want to compare if the row element is found to be the same. :)

– Greedy Coder
Nov 20 '18 at 14:33

sure, you can check the docs so as to handle the suffixes given by merge function. you can also drop them if unwanted :)

– anky_91
Nov 20 '18 at 14:38

@GreedyCoder : if any of the answers helped, please dont forget to accept one of the answers so people can refer this question later and find their solution. :) thanks

– anky_91
Nov 27 '18 at 13:03

add a comment |

So I don't know where the row value Shirt is present in both datasets. Can I still use inner merge?

– Greedy Coder
Nov 20 '18 at 14:30

1

That's awesome! I'm going to try this and update. Using multiple columns as a list also helps, because I have a set of 'key' columns that I want to compare if the row element is found to be the same. :)

– Greedy Coder
Nov 20 '18 at 14:33

sure, you can check the docs so as to handle the suffixes given by merge function. you can also drop them if unwanted :)

– anky_91
Nov 20 '18 at 14:38

@GreedyCoder : if any of the answers helped, please dont forget to accept one of the answers so people can refer this question later and find their solution. :) thanks

– anky_91
Nov 27 '18 at 13:03

So I don't know where the row value Shirt is present in both datasets. Can I still use inner merge?

– Greedy Coder
Nov 20 '18 at 14:30

That's awesome! I'm going to try this and update. Using multiple columns as a list also helps, because I have a set of 'key' columns that I want to compare if the row element is found to be the same. :)

– Greedy Coder
Nov 20 '18 at 14:33

sure, you can check the docs so as to handle the suffixes given by merge function. you can also drop them if unwanted :)

– anky_91
Nov 20 '18 at 14:38

@GreedyCoder : if any of the answers helped, please dont forget to accept one of the answers so people can refer this question later and find their solution. :) thanks

– anky_91
Nov 27 '18 at 13:03

add a comment |

You can simply use pandas merge like this:

pd.merge(df1[df1.Item == 'Shirt'], df2[df2.Item == 'Shirt'], on=['Item','Color', 'Price')

This will produce an output only when all columns in both dataframes match for Item='Shirt' and corresponding columns are also equal.

Let me know if this is what you want.

answered Nov 20 '18 at 14:28

Mayank Porwal

4,9072724

add a comment |

You can simply use pandas merge like this:

pd.merge(df1[df1.Item == 'Shirt'], df2[df2.Item == 'Shirt'], on=['Item','Color', 'Price')

This will produce an output only when all columns in both dataframes match for Item='Shirt' and corresponding columns are also equal.

Let me know if this is what you want.

answered Nov 20 '18 at 14:28

Mayank Porwal

4,9072724

add a comment |

You can simply use pandas merge like this:

pd.merge(df1[df1.Item == 'Shirt'], df2[df2.Item == 'Shirt'], on=['Item','Color', 'Price')

This will produce an output only when all columns in both dataframes match for Item='Shirt' and corresponding columns are also equal.

Let me know if this is what you want.

answered Nov 20 '18 at 14:28

Mayank Porwal

4,9072724

You can simply use pandas merge like this:

pd.merge(df1[df1.Item == 'Shirt'], df2[df2.Item == 'Shirt'], on=['Item','Color', 'Price')

This will produce an output only when all columns in both dataframes match for Item='Shirt' and corresponding columns are also equal.

Let me know if this is what you want.

answered Nov 20 '18 at 14:28

Mayank Porwal

4,9072724

answered Nov 20 '18 at 14:28

Mayank Porwal

4,9072724

answered Nov 20 '18 at 14:28

Mayank Porwal

4,9072724

answered Nov 20 '18 at 14:28

Mayank Porwal

4,9072724

add a comment |

Filter after merge

df1.merge(df2,on='Item').loc[lambda x : x.Item=='Shirt']

Out[89]: 

   Row_x   Item Color_x Price_x  Row_y Color_y Price_y

0      1  Shirt     Red     $30      4     Red     $30

answered Nov 20 '18 at 14:30

W-B

107k83165

add a comment |

Filter after merge

df1.merge(df2,on='Item').loc[lambda x : x.Item=='Shirt']

Out[89]: 

   Row_x   Item Color_x Price_x  Row_y Color_y Price_y

0      1  Shirt     Red     $30      4     Red     $30

answered Nov 20 '18 at 14:30

W-B

107k83165

add a comment |

Filter after merge

df1.merge(df2,on='Item').loc[lambda x : x.Item=='Shirt']

Out[89]: 

   Row_x   Item Color_x Price_x  Row_y Color_y Price_y

0      1  Shirt     Red     $30      4     Red     $30

answered Nov 20 '18 at 14:30

W-B

107k83165

Filter after merge

df1.merge(df2,on='Item').loc[lambda x : x.Item=='Shirt']

Out[89]: 

   Row_x   Item Color_x Price_x  Row_y Color_y Price_y

0      1  Shirt     Red     $30      4     Red     $30

answered Nov 20 '18 at 14:30

W-B

107k83165

answered Nov 20 '18 at 14:30

W-B

107k83165

answered Nov 20 '18 at 14:30

W-B

107k83165

answered Nov 20 '18 at 14:30

W-B

107k83165

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

Search This Blog

Ufyukyu