Compare dataframes based on row values in Pandas
What is the best way to compare two datasets (.csv files) using Pandas where there is no 1:1 cardinality between the data?
For example: Here's a sample from dataset one -
#### Row Item Color Price
01 Shirt Red $30
02 Hat Blue $10
And a sample from dataframe two -
#### Row Item Color Price
01 Trouser Black $20
02 Bag Yellow $ 30
03 Hat Blue $10
04 Shirt Red $30
So if I wanted to compare all rows in both datasets where there is a Shirt row.
what is the best way to do it?
I'm using Pandas/Python3.7
Thanks!
python python-3.x pandas dataframe data-analysis
add a comment |
What is the best way to compare two datasets (.csv files) using Pandas where there is no 1:1 cardinality between the data?
For example: Here's a sample from dataset one -
#### Row Item Color Price
01 Shirt Red $30
02 Hat Blue $10
And a sample from dataframe two -
#### Row Item Color Price
01 Trouser Black $20
02 Bag Yellow $ 30
03 Hat Blue $10
04 Shirt Red $30
So if I wanted to compare all rows in both datasets where there is a Shirt row.
what is the best way to do it?
I'm using Pandas/Python3.7
Thanks!
python python-3.x pandas dataframe data-analysis
1
What is the expected output?
– Mayank Porwal
Nov 20 '18 at 14:26
The expected output should be - Shirt Red $30 matching with Shirt Red $30 from both dataframes and Hat Blue $10 matching with Hat Blue $10 from both dataframes. So basically, if there is a row values called Shirt in both dataframes, then all columns should be compared for those rows.
– Greedy Coder
Nov 20 '18 at 14:27
add a comment |
What is the best way to compare two datasets (.csv files) using Pandas where there is no 1:1 cardinality between the data?
For example: Here's a sample from dataset one -
#### Row Item Color Price
01 Shirt Red $30
02 Hat Blue $10
And a sample from dataframe two -
#### Row Item Color Price
01 Trouser Black $20
02 Bag Yellow $ 30
03 Hat Blue $10
04 Shirt Red $30
So if I wanted to compare all rows in both datasets where there is a Shirt row.
what is the best way to do it?
I'm using Pandas/Python3.7
Thanks!
python python-3.x pandas dataframe data-analysis
What is the best way to compare two datasets (.csv files) using Pandas where there is no 1:1 cardinality between the data?
For example: Here's a sample from dataset one -
#### Row Item Color Price
01 Shirt Red $30
02 Hat Blue $10
And a sample from dataframe two -
#### Row Item Color Price
01 Trouser Black $20
02 Bag Yellow $ 30
03 Hat Blue $10
04 Shirt Red $30
So if I wanted to compare all rows in both datasets where there is a Shirt row.
what is the best way to do it?
I'm using Pandas/Python3.7
Thanks!
python python-3.x pandas dataframe data-analysis
python python-3.x pandas dataframe data-analysis
edited Dec 19 '18 at 6:34
Mayank Porwal
4,9072724
4,9072724
asked Nov 20 '18 at 14:23
Greedy CoderGreedy Coder
6118
6118
1
What is the expected output?
– Mayank Porwal
Nov 20 '18 at 14:26
The expected output should be - Shirt Red $30 matching with Shirt Red $30 from both dataframes and Hat Blue $10 matching with Hat Blue $10 from both dataframes. So basically, if there is a row values called Shirt in both dataframes, then all columns should be compared for those rows.
– Greedy Coder
Nov 20 '18 at 14:27
add a comment |
1
What is the expected output?
– Mayank Porwal
Nov 20 '18 at 14:26
The expected output should be - Shirt Red $30 matching with Shirt Red $30 from both dataframes and Hat Blue $10 matching with Hat Blue $10 from both dataframes. So basically, if there is a row values called Shirt in both dataframes, then all columns should be compared for those rows.
– Greedy Coder
Nov 20 '18 at 14:27
1
1
What is the expected output?
– Mayank Porwal
Nov 20 '18 at 14:26
What is the expected output?
– Mayank Porwal
Nov 20 '18 at 14:26
The expected output should be - Shirt Red $30 matching with Shirt Red $30 from both dataframes and Hat Blue $10 matching with Hat Blue $10 from both dataframes. So basically, if there is a row values called Shirt in both dataframes, then all columns should be compared for those rows.
– Greedy Coder
Nov 20 '18 at 14:27
The expected output should be - Shirt Red $30 matching with Shirt Red $30 from both dataframes and Hat Blue $10 matching with Hat Blue $10 from both dataframes. So basically, if there is a row values called Shirt in both dataframes, then all columns should be compared for those rows.
– Greedy Coder
Nov 20 '18 at 14:27
add a comment |
3 Answers
3
active
oldest
votes
check this example:
DF1 = pd.DataFrame(data={'c1':['abc','abc','iop','iop'],'c2':['xyz','mno','yut','trg'],'c3':[0,0,0,0]})
c1 c2 c3
0 abc xyz 0
1 abc mno 0
2 iop yut 0
3 iop trg 0
DF2 = pd.DataFrame(data={'c1':['iop','abc','bhj','iop','xdf'],'c2':['yut','mno','uio','yut','edc']})
c1 c2
0 iop yut
1 abc mno
2 bhj uio
3 iop yut
4 xdf edc
match = pd.merge(DF1,DF2,on=['c1','c2'],how='inner')
print(match)
c1 c2 c3
0 abc mno 0
1 iop yut 0
2 iop yut 0
So by using pd.merge
you can get the matching rows
So I don't know where the row value Shirt is present in both datasets. Can I still use inner merge?
– Greedy Coder
Nov 20 '18 at 14:30
1
That's awesome! I'm going to try this and update. Using multiple columns as a list also helps, because I have a set of 'key' columns that I want to compare if the row element is found to be the same. :)
– Greedy Coder
Nov 20 '18 at 14:33
sure, you can check the docs so as to handle the suffixes given by merge function. you can also drop them if unwanted :)
– anky_91
Nov 20 '18 at 14:38
@GreedyCoder : if any of the answers helped, please dont forget to accept one of the answers so people can refer this question later and find their solution. :) thanks
– anky_91
Nov 27 '18 at 13:03
add a comment |
You can simply use pandas merge
like this:
pd.merge(df1[df1.Item == 'Shirt'], df2[df2.Item == 'Shirt'], on=['Item','Color', 'Price')
This will produce an output only when all columns in both dataframes match for Item='Shirt' and corresponding columns are also equal.
Let me know if this is what you want.
add a comment |
Filter after merge
df1.merge(df2,on='Item').loc[lambda x : x.Item=='Shirt']
Out[89]:
Row_x Item Color_x Price_x Row_y Color_y Price_y
0 1 Shirt Red $30 4 Red $30
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53395097%2fcompare-dataframes-based-on-row-values-in-pandas%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
check this example:
DF1 = pd.DataFrame(data={'c1':['abc','abc','iop','iop'],'c2':['xyz','mno','yut','trg'],'c3':[0,0,0,0]})
c1 c2 c3
0 abc xyz 0
1 abc mno 0
2 iop yut 0
3 iop trg 0
DF2 = pd.DataFrame(data={'c1':['iop','abc','bhj','iop','xdf'],'c2':['yut','mno','uio','yut','edc']})
c1 c2
0 iop yut
1 abc mno
2 bhj uio
3 iop yut
4 xdf edc
match = pd.merge(DF1,DF2,on=['c1','c2'],how='inner')
print(match)
c1 c2 c3
0 abc mno 0
1 iop yut 0
2 iop yut 0
So by using pd.merge
you can get the matching rows
So I don't know where the row value Shirt is present in both datasets. Can I still use inner merge?
– Greedy Coder
Nov 20 '18 at 14:30
1
That's awesome! I'm going to try this and update. Using multiple columns as a list also helps, because I have a set of 'key' columns that I want to compare if the row element is found to be the same. :)
– Greedy Coder
Nov 20 '18 at 14:33
sure, you can check the docs so as to handle the suffixes given by merge function. you can also drop them if unwanted :)
– anky_91
Nov 20 '18 at 14:38
@GreedyCoder : if any of the answers helped, please dont forget to accept one of the answers so people can refer this question later and find their solution. :) thanks
– anky_91
Nov 27 '18 at 13:03
add a comment |
check this example:
DF1 = pd.DataFrame(data={'c1':['abc','abc','iop','iop'],'c2':['xyz','mno','yut','trg'],'c3':[0,0,0,0]})
c1 c2 c3
0 abc xyz 0
1 abc mno 0
2 iop yut 0
3 iop trg 0
DF2 = pd.DataFrame(data={'c1':['iop','abc','bhj','iop','xdf'],'c2':['yut','mno','uio','yut','edc']})
c1 c2
0 iop yut
1 abc mno
2 bhj uio
3 iop yut
4 xdf edc
match = pd.merge(DF1,DF2,on=['c1','c2'],how='inner')
print(match)
c1 c2 c3
0 abc mno 0
1 iop yut 0
2 iop yut 0
So by using pd.merge
you can get the matching rows
So I don't know where the row value Shirt is present in both datasets. Can I still use inner merge?
– Greedy Coder
Nov 20 '18 at 14:30
1
That's awesome! I'm going to try this and update. Using multiple columns as a list also helps, because I have a set of 'key' columns that I want to compare if the row element is found to be the same. :)
– Greedy Coder
Nov 20 '18 at 14:33
sure, you can check the docs so as to handle the suffixes given by merge function. you can also drop them if unwanted :)
– anky_91
Nov 20 '18 at 14:38
@GreedyCoder : if any of the answers helped, please dont forget to accept one of the answers so people can refer this question later and find their solution. :) thanks
– anky_91
Nov 27 '18 at 13:03
add a comment |
check this example:
DF1 = pd.DataFrame(data={'c1':['abc','abc','iop','iop'],'c2':['xyz','mno','yut','trg'],'c3':[0,0,0,0]})
c1 c2 c3
0 abc xyz 0
1 abc mno 0
2 iop yut 0
3 iop trg 0
DF2 = pd.DataFrame(data={'c1':['iop','abc','bhj','iop','xdf'],'c2':['yut','mno','uio','yut','edc']})
c1 c2
0 iop yut
1 abc mno
2 bhj uio
3 iop yut
4 xdf edc
match = pd.merge(DF1,DF2,on=['c1','c2'],how='inner')
print(match)
c1 c2 c3
0 abc mno 0
1 iop yut 0
2 iop yut 0
So by using pd.merge
you can get the matching rows
check this example:
DF1 = pd.DataFrame(data={'c1':['abc','abc','iop','iop'],'c2':['xyz','mno','yut','trg'],'c3':[0,0,0,0]})
c1 c2 c3
0 abc xyz 0
1 abc mno 0
2 iop yut 0
3 iop trg 0
DF2 = pd.DataFrame(data={'c1':['iop','abc','bhj','iop','xdf'],'c2':['yut','mno','uio','yut','edc']})
c1 c2
0 iop yut
1 abc mno
2 bhj uio
3 iop yut
4 xdf edc
match = pd.merge(DF1,DF2,on=['c1','c2'],how='inner')
print(match)
c1 c2 c3
0 abc mno 0
1 iop yut 0
2 iop yut 0
So by using pd.merge
you can get the matching rows
answered Nov 20 '18 at 14:26


anky_91anky_91
2,8032318
2,8032318
So I don't know where the row value Shirt is present in both datasets. Can I still use inner merge?
– Greedy Coder
Nov 20 '18 at 14:30
1
That's awesome! I'm going to try this and update. Using multiple columns as a list also helps, because I have a set of 'key' columns that I want to compare if the row element is found to be the same. :)
– Greedy Coder
Nov 20 '18 at 14:33
sure, you can check the docs so as to handle the suffixes given by merge function. you can also drop them if unwanted :)
– anky_91
Nov 20 '18 at 14:38
@GreedyCoder : if any of the answers helped, please dont forget to accept one of the answers so people can refer this question later and find their solution. :) thanks
– anky_91
Nov 27 '18 at 13:03
add a comment |
So I don't know where the row value Shirt is present in both datasets. Can I still use inner merge?
– Greedy Coder
Nov 20 '18 at 14:30
1
That's awesome! I'm going to try this and update. Using multiple columns as a list also helps, because I have a set of 'key' columns that I want to compare if the row element is found to be the same. :)
– Greedy Coder
Nov 20 '18 at 14:33
sure, you can check the docs so as to handle the suffixes given by merge function. you can also drop them if unwanted :)
– anky_91
Nov 20 '18 at 14:38
@GreedyCoder : if any of the answers helped, please dont forget to accept one of the answers so people can refer this question later and find their solution. :) thanks
– anky_91
Nov 27 '18 at 13:03
So I don't know where the row value Shirt is present in both datasets. Can I still use inner merge?
– Greedy Coder
Nov 20 '18 at 14:30
So I don't know where the row value Shirt is present in both datasets. Can I still use inner merge?
– Greedy Coder
Nov 20 '18 at 14:30
1
1
That's awesome! I'm going to try this and update. Using multiple columns as a list also helps, because I have a set of 'key' columns that I want to compare if the row element is found to be the same. :)
– Greedy Coder
Nov 20 '18 at 14:33
That's awesome! I'm going to try this and update. Using multiple columns as a list also helps, because I have a set of 'key' columns that I want to compare if the row element is found to be the same. :)
– Greedy Coder
Nov 20 '18 at 14:33
sure, you can check the docs so as to handle the suffixes given by merge function. you can also drop them if unwanted :)
– anky_91
Nov 20 '18 at 14:38
sure, you can check the docs so as to handle the suffixes given by merge function. you can also drop them if unwanted :)
– anky_91
Nov 20 '18 at 14:38
@GreedyCoder : if any of the answers helped, please dont forget to accept one of the answers so people can refer this question later and find their solution. :) thanks
– anky_91
Nov 27 '18 at 13:03
@GreedyCoder : if any of the answers helped, please dont forget to accept one of the answers so people can refer this question later and find their solution. :) thanks
– anky_91
Nov 27 '18 at 13:03
add a comment |
You can simply use pandas merge
like this:
pd.merge(df1[df1.Item == 'Shirt'], df2[df2.Item == 'Shirt'], on=['Item','Color', 'Price')
This will produce an output only when all columns in both dataframes match for Item='Shirt' and corresponding columns are also equal.
Let me know if this is what you want.
add a comment |
You can simply use pandas merge
like this:
pd.merge(df1[df1.Item == 'Shirt'], df2[df2.Item == 'Shirt'], on=['Item','Color', 'Price')
This will produce an output only when all columns in both dataframes match for Item='Shirt' and corresponding columns are also equal.
Let me know if this is what you want.
add a comment |
You can simply use pandas merge
like this:
pd.merge(df1[df1.Item == 'Shirt'], df2[df2.Item == 'Shirt'], on=['Item','Color', 'Price')
This will produce an output only when all columns in both dataframes match for Item='Shirt' and corresponding columns are also equal.
Let me know if this is what you want.
You can simply use pandas merge
like this:
pd.merge(df1[df1.Item == 'Shirt'], df2[df2.Item == 'Shirt'], on=['Item','Color', 'Price')
This will produce an output only when all columns in both dataframes match for Item='Shirt' and corresponding columns are also equal.
Let me know if this is what you want.
answered Nov 20 '18 at 14:28
Mayank PorwalMayank Porwal
4,9072724
4,9072724
add a comment |
add a comment |
Filter after merge
df1.merge(df2,on='Item').loc[lambda x : x.Item=='Shirt']
Out[89]:
Row_x Item Color_x Price_x Row_y Color_y Price_y
0 1 Shirt Red $30 4 Red $30
add a comment |
Filter after merge
df1.merge(df2,on='Item').loc[lambda x : x.Item=='Shirt']
Out[89]:
Row_x Item Color_x Price_x Row_y Color_y Price_y
0 1 Shirt Red $30 4 Red $30
add a comment |
Filter after merge
df1.merge(df2,on='Item').loc[lambda x : x.Item=='Shirt']
Out[89]:
Row_x Item Color_x Price_x Row_y Color_y Price_y
0 1 Shirt Red $30 4 Red $30
Filter after merge
df1.merge(df2,on='Item').loc[lambda x : x.Item=='Shirt']
Out[89]:
Row_x Item Color_x Price_x Row_y Color_y Price_y
0 1 Shirt Red $30 4 Red $30
answered Nov 20 '18 at 14:30


W-BW-B
107k83165
107k83165
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53395097%2fcompare-dataframes-based-on-row-values-in-pandas%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
What is the expected output?
– Mayank Porwal
Nov 20 '18 at 14:26
The expected output should be - Shirt Red $30 matching with Shirt Red $30 from both dataframes and Hat Blue $10 matching with Hat Blue $10 from both dataframes. So basically, if there is a row values called Shirt in both dataframes, then all columns should be compared for those rows.
– Greedy Coder
Nov 20 '18 at 14:27