Pandas - Performing left join across 2 Dataframe in Pandas
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}
I have two dataframes (one that has list of all days in a month and the other than has days when a staff marked attendance) and I am trying to perform a left join such that I have a new Dataframe with all dates and with dates when employee did and did not mark attendance.
Given below is how df1
is:
days
01-01-2018
02-01-2018
03-01-2018
04-01-2018
05-01-2018
06-01-2018
07-01-2018
Given below is how df2
is:
date, emp_id
01-01-2018,101
03-01-2018,101
04-01-2018,101
06-01-2018,101
I am trying to create a new Dataframe as below:
date,marked,emp_id
01-01-2018,01-01-2018,101
02-01-2018,02-01-2018,101
03-01-2018,03-01-2018,101
04-01-2018,04-01-2018,101
05-01-2018,05-01-2018,101
06-01-2018,06-01-2018,101
Days when a value exists in df2
, the new Dataframe shall have a valid date if the date exists in df1 and df2 else it should be null. I tried doing the below but I see it returns all dates
new_df = pd.merge(df1, df2, how='left', left_on=['days'], right_on = ['date'])
python pandas dataframe
add a comment |
I have two dataframes (one that has list of all days in a month and the other than has days when a staff marked attendance) and I am trying to perform a left join such that I have a new Dataframe with all dates and with dates when employee did and did not mark attendance.
Given below is how df1
is:
days
01-01-2018
02-01-2018
03-01-2018
04-01-2018
05-01-2018
06-01-2018
07-01-2018
Given below is how df2
is:
date, emp_id
01-01-2018,101
03-01-2018,101
04-01-2018,101
06-01-2018,101
I am trying to create a new Dataframe as below:
date,marked,emp_id
01-01-2018,01-01-2018,101
02-01-2018,02-01-2018,101
03-01-2018,03-01-2018,101
04-01-2018,04-01-2018,101
05-01-2018,05-01-2018,101
06-01-2018,06-01-2018,101
Days when a value exists in df2
, the new Dataframe shall have a valid date if the date exists in df1 and df2 else it should be null. I tried doing the below but I see it returns all dates
new_df = pd.merge(df1, df2, how='left', left_on=['days'], right_on = ['date'])
python pandas dataframe
it will because you have done left join, and your left table isdf1
which contains all the days and hence working as it should work!!
– Rahul Agarwal
Jan 3 at 9:26
@RahulAgarwal, agree but when date value in df2 does not exist it still returns the rows and copies the date from df1 to df2. For example, as shown in the above example it is seen on Jan 2, there is no value in df2 but it is returned in the new Dataframe
– dark horse
Jan 3 at 9:28
is marked 0 -1 or a date?
– Daniel Mesejo
Jan 3 at 9:34
@DanielMesejo, if it could return a Date it should be fine..
– dark horse
Jan 3 at 9:35
add a comment |
I have two dataframes (one that has list of all days in a month and the other than has days when a staff marked attendance) and I am trying to perform a left join such that I have a new Dataframe with all dates and with dates when employee did and did not mark attendance.
Given below is how df1
is:
days
01-01-2018
02-01-2018
03-01-2018
04-01-2018
05-01-2018
06-01-2018
07-01-2018
Given below is how df2
is:
date, emp_id
01-01-2018,101
03-01-2018,101
04-01-2018,101
06-01-2018,101
I am trying to create a new Dataframe as below:
date,marked,emp_id
01-01-2018,01-01-2018,101
02-01-2018,02-01-2018,101
03-01-2018,03-01-2018,101
04-01-2018,04-01-2018,101
05-01-2018,05-01-2018,101
06-01-2018,06-01-2018,101
Days when a value exists in df2
, the new Dataframe shall have a valid date if the date exists in df1 and df2 else it should be null. I tried doing the below but I see it returns all dates
new_df = pd.merge(df1, df2, how='left', left_on=['days'], right_on = ['date'])
python pandas dataframe
I have two dataframes (one that has list of all days in a month and the other than has days when a staff marked attendance) and I am trying to perform a left join such that I have a new Dataframe with all dates and with dates when employee did and did not mark attendance.
Given below is how df1
is:
days
01-01-2018
02-01-2018
03-01-2018
04-01-2018
05-01-2018
06-01-2018
07-01-2018
Given below is how df2
is:
date, emp_id
01-01-2018,101
03-01-2018,101
04-01-2018,101
06-01-2018,101
I am trying to create a new Dataframe as below:
date,marked,emp_id
01-01-2018,01-01-2018,101
02-01-2018,02-01-2018,101
03-01-2018,03-01-2018,101
04-01-2018,04-01-2018,101
05-01-2018,05-01-2018,101
06-01-2018,06-01-2018,101
Days when a value exists in df2
, the new Dataframe shall have a valid date if the date exists in df1 and df2 else it should be null. I tried doing the below but I see it returns all dates
new_df = pd.merge(df1, df2, how='left', left_on=['days'], right_on = ['date'])
python pandas dataframe
python pandas dataframe
edited Jan 22 at 17:50
Scott Boston
58.3k73258
58.3k73258
asked Jan 3 at 9:21
dark horsedark horse
16210
16210
it will because you have done left join, and your left table isdf1
which contains all the days and hence working as it should work!!
– Rahul Agarwal
Jan 3 at 9:26
@RahulAgarwal, agree but when date value in df2 does not exist it still returns the rows and copies the date from df1 to df2. For example, as shown in the above example it is seen on Jan 2, there is no value in df2 but it is returned in the new Dataframe
– dark horse
Jan 3 at 9:28
is marked 0 -1 or a date?
– Daniel Mesejo
Jan 3 at 9:34
@DanielMesejo, if it could return a Date it should be fine..
– dark horse
Jan 3 at 9:35
add a comment |
it will because you have done left join, and your left table isdf1
which contains all the days and hence working as it should work!!
– Rahul Agarwal
Jan 3 at 9:26
@RahulAgarwal, agree but when date value in df2 does not exist it still returns the rows and copies the date from df1 to df2. For example, as shown in the above example it is seen on Jan 2, there is no value in df2 but it is returned in the new Dataframe
– dark horse
Jan 3 at 9:28
is marked 0 -1 or a date?
– Daniel Mesejo
Jan 3 at 9:34
@DanielMesejo, if it could return a Date it should be fine..
– dark horse
Jan 3 at 9:35
it will because you have done left join, and your left table is
df1
which contains all the days and hence working as it should work!!– Rahul Agarwal
Jan 3 at 9:26
it will because you have done left join, and your left table is
df1
which contains all the days and hence working as it should work!!– Rahul Agarwal
Jan 3 at 9:26
@RahulAgarwal, agree but when date value in df2 does not exist it still returns the rows and copies the date from df1 to df2. For example, as shown in the above example it is seen on Jan 2, there is no value in df2 but it is returned in the new Dataframe
– dark horse
Jan 3 at 9:28
@RahulAgarwal, agree but when date value in df2 does not exist it still returns the rows and copies the date from df1 to df2. For example, as shown in the above example it is seen on Jan 2, there is no value in df2 but it is returned in the new Dataframe
– dark horse
Jan 3 at 9:28
is marked 0 -1 or a date?
– Daniel Mesejo
Jan 3 at 9:34
is marked 0 -1 or a date?
– Daniel Mesejo
Jan 3 at 9:34
@DanielMesejo, if it could return a Date it should be fine..
– dark horse
Jan 3 at 9:35
@DanielMesejo, if it could return a Date it should be fine..
– dark horse
Jan 3 at 9:35
add a comment |
1 Answer
1
active
oldest
votes
You could do something like this:
new_df = pd.merge(df1, df2, how='outer', left_on=['days'], right_on = ['date'])
new_df = new_df.fillna({'emp_id': 101.0})
print(new_df)
Output
days date emp_id
0 2018-01-01 2018-01-01 101.0
1 2018-01-02 NaT 101.0
2 2018-01-03 2018-01-03 101.0
3 2018-01-04 2018-01-04 101.0
4 2018-01-05 NaT 101.0
5 2018-01-06 2018-01-06 101.0
6 2018-01-07 NaT 101.0
If you want a sort of indicator column, do this, instead:
new_df = pd.merge(df1, df2, how='outer', left_on=['days'], right_on = ['date']).fillna({'emp_id': 101.0})
new_df['marked'] = (new_df.days == new_df.date).astype(np.uint8)
new_df = new_df.drop('date', axis=1)
print(new_df)
Output
days emp_id marked
0 2018-01-01 101.0 1
1 2018-01-02 101.0 0
2 2018-01-03 101.0 1
3 2018-01-04 101.0 1
4 2018-01-05 101.0 0
5 2018-01-06 101.0 1
6 2018-01-07 101.0 0
Mesojo, thanks for the feedback, just one concern I have is what if I have a list of emp_id how could I modify the fillna method in the above. Could you please advice.
– dark horse
Jan 3 at 9:44
@darkhorse - that will depend on your application, there are a few methods, you could see: pandas.pydata.org/pandas-docs/stable/generated/…
– Daniel Mesejo
Jan 3 at 9:45
thanks will check it out..
– dark horse
Jan 3 at 10:00
1
indicator=True parameter is better here :)
– jezrael
Jan 3 at 10:29
@jezrael, tried doing pd.merge(df1, df2, how='outer', indicator=True, on='days') and I saw only common fields returned back. The missing rows did not get returned..
– dark horse
Jan 3 at 12:39
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54019363%2fpandas-performing-left-join-across-2-dataframe-in-pandas%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
You could do something like this:
new_df = pd.merge(df1, df2, how='outer', left_on=['days'], right_on = ['date'])
new_df = new_df.fillna({'emp_id': 101.0})
print(new_df)
Output
days date emp_id
0 2018-01-01 2018-01-01 101.0
1 2018-01-02 NaT 101.0
2 2018-01-03 2018-01-03 101.0
3 2018-01-04 2018-01-04 101.0
4 2018-01-05 NaT 101.0
5 2018-01-06 2018-01-06 101.0
6 2018-01-07 NaT 101.0
If you want a sort of indicator column, do this, instead:
new_df = pd.merge(df1, df2, how='outer', left_on=['days'], right_on = ['date']).fillna({'emp_id': 101.0})
new_df['marked'] = (new_df.days == new_df.date).astype(np.uint8)
new_df = new_df.drop('date', axis=1)
print(new_df)
Output
days emp_id marked
0 2018-01-01 101.0 1
1 2018-01-02 101.0 0
2 2018-01-03 101.0 1
3 2018-01-04 101.0 1
4 2018-01-05 101.0 0
5 2018-01-06 101.0 1
6 2018-01-07 101.0 0
Mesojo, thanks for the feedback, just one concern I have is what if I have a list of emp_id how could I modify the fillna method in the above. Could you please advice.
– dark horse
Jan 3 at 9:44
@darkhorse - that will depend on your application, there are a few methods, you could see: pandas.pydata.org/pandas-docs/stable/generated/…
– Daniel Mesejo
Jan 3 at 9:45
thanks will check it out..
– dark horse
Jan 3 at 10:00
1
indicator=True parameter is better here :)
– jezrael
Jan 3 at 10:29
@jezrael, tried doing pd.merge(df1, df2, how='outer', indicator=True, on='days') and I saw only common fields returned back. The missing rows did not get returned..
– dark horse
Jan 3 at 12:39
add a comment |
You could do something like this:
new_df = pd.merge(df1, df2, how='outer', left_on=['days'], right_on = ['date'])
new_df = new_df.fillna({'emp_id': 101.0})
print(new_df)
Output
days date emp_id
0 2018-01-01 2018-01-01 101.0
1 2018-01-02 NaT 101.0
2 2018-01-03 2018-01-03 101.0
3 2018-01-04 2018-01-04 101.0
4 2018-01-05 NaT 101.0
5 2018-01-06 2018-01-06 101.0
6 2018-01-07 NaT 101.0
If you want a sort of indicator column, do this, instead:
new_df = pd.merge(df1, df2, how='outer', left_on=['days'], right_on = ['date']).fillna({'emp_id': 101.0})
new_df['marked'] = (new_df.days == new_df.date).astype(np.uint8)
new_df = new_df.drop('date', axis=1)
print(new_df)
Output
days emp_id marked
0 2018-01-01 101.0 1
1 2018-01-02 101.0 0
2 2018-01-03 101.0 1
3 2018-01-04 101.0 1
4 2018-01-05 101.0 0
5 2018-01-06 101.0 1
6 2018-01-07 101.0 0
Mesojo, thanks for the feedback, just one concern I have is what if I have a list of emp_id how could I modify the fillna method in the above. Could you please advice.
– dark horse
Jan 3 at 9:44
@darkhorse - that will depend on your application, there are a few methods, you could see: pandas.pydata.org/pandas-docs/stable/generated/…
– Daniel Mesejo
Jan 3 at 9:45
thanks will check it out..
– dark horse
Jan 3 at 10:00
1
indicator=True parameter is better here :)
– jezrael
Jan 3 at 10:29
@jezrael, tried doing pd.merge(df1, df2, how='outer', indicator=True, on='days') and I saw only common fields returned back. The missing rows did not get returned..
– dark horse
Jan 3 at 12:39
add a comment |
You could do something like this:
new_df = pd.merge(df1, df2, how='outer', left_on=['days'], right_on = ['date'])
new_df = new_df.fillna({'emp_id': 101.0})
print(new_df)
Output
days date emp_id
0 2018-01-01 2018-01-01 101.0
1 2018-01-02 NaT 101.0
2 2018-01-03 2018-01-03 101.0
3 2018-01-04 2018-01-04 101.0
4 2018-01-05 NaT 101.0
5 2018-01-06 2018-01-06 101.0
6 2018-01-07 NaT 101.0
If you want a sort of indicator column, do this, instead:
new_df = pd.merge(df1, df2, how='outer', left_on=['days'], right_on = ['date']).fillna({'emp_id': 101.0})
new_df['marked'] = (new_df.days == new_df.date).astype(np.uint8)
new_df = new_df.drop('date', axis=1)
print(new_df)
Output
days emp_id marked
0 2018-01-01 101.0 1
1 2018-01-02 101.0 0
2 2018-01-03 101.0 1
3 2018-01-04 101.0 1
4 2018-01-05 101.0 0
5 2018-01-06 101.0 1
6 2018-01-07 101.0 0
You could do something like this:
new_df = pd.merge(df1, df2, how='outer', left_on=['days'], right_on = ['date'])
new_df = new_df.fillna({'emp_id': 101.0})
print(new_df)
Output
days date emp_id
0 2018-01-01 2018-01-01 101.0
1 2018-01-02 NaT 101.0
2 2018-01-03 2018-01-03 101.0
3 2018-01-04 2018-01-04 101.0
4 2018-01-05 NaT 101.0
5 2018-01-06 2018-01-06 101.0
6 2018-01-07 NaT 101.0
If you want a sort of indicator column, do this, instead:
new_df = pd.merge(df1, df2, how='outer', left_on=['days'], right_on = ['date']).fillna({'emp_id': 101.0})
new_df['marked'] = (new_df.days == new_df.date).astype(np.uint8)
new_df = new_df.drop('date', axis=1)
print(new_df)
Output
days emp_id marked
0 2018-01-01 101.0 1
1 2018-01-02 101.0 0
2 2018-01-03 101.0 1
3 2018-01-04 101.0 1
4 2018-01-05 101.0 0
5 2018-01-06 101.0 1
6 2018-01-07 101.0 0
answered Jan 3 at 9:36
Daniel MesejoDaniel Mesejo
18.8k21533
18.8k21533
Mesojo, thanks for the feedback, just one concern I have is what if I have a list of emp_id how could I modify the fillna method in the above. Could you please advice.
– dark horse
Jan 3 at 9:44
@darkhorse - that will depend on your application, there are a few methods, you could see: pandas.pydata.org/pandas-docs/stable/generated/…
– Daniel Mesejo
Jan 3 at 9:45
thanks will check it out..
– dark horse
Jan 3 at 10:00
1
indicator=True parameter is better here :)
– jezrael
Jan 3 at 10:29
@jezrael, tried doing pd.merge(df1, df2, how='outer', indicator=True, on='days') and I saw only common fields returned back. The missing rows did not get returned..
– dark horse
Jan 3 at 12:39
add a comment |
Mesojo, thanks for the feedback, just one concern I have is what if I have a list of emp_id how could I modify the fillna method in the above. Could you please advice.
– dark horse
Jan 3 at 9:44
@darkhorse - that will depend on your application, there are a few methods, you could see: pandas.pydata.org/pandas-docs/stable/generated/…
– Daniel Mesejo
Jan 3 at 9:45
thanks will check it out..
– dark horse
Jan 3 at 10:00
1
indicator=True parameter is better here :)
– jezrael
Jan 3 at 10:29
@jezrael, tried doing pd.merge(df1, df2, how='outer', indicator=True, on='days') and I saw only common fields returned back. The missing rows did not get returned..
– dark horse
Jan 3 at 12:39
Mesojo, thanks for the feedback, just one concern I have is what if I have a list of emp_id how could I modify the fillna method in the above. Could you please advice.
– dark horse
Jan 3 at 9:44
Mesojo, thanks for the feedback, just one concern I have is what if I have a list of emp_id how could I modify the fillna method in the above. Could you please advice.
– dark horse
Jan 3 at 9:44
@darkhorse - that will depend on your application, there are a few methods, you could see: pandas.pydata.org/pandas-docs/stable/generated/…
– Daniel Mesejo
Jan 3 at 9:45
@darkhorse - that will depend on your application, there are a few methods, you could see: pandas.pydata.org/pandas-docs/stable/generated/…
– Daniel Mesejo
Jan 3 at 9:45
thanks will check it out..
– dark horse
Jan 3 at 10:00
thanks will check it out..
– dark horse
Jan 3 at 10:00
1
1
indicator=True parameter is better here :)
– jezrael
Jan 3 at 10:29
indicator=True parameter is better here :)
– jezrael
Jan 3 at 10:29
@jezrael, tried doing pd.merge(df1, df2, how='outer', indicator=True, on='days') and I saw only common fields returned back. The missing rows did not get returned..
– dark horse
Jan 3 at 12:39
@jezrael, tried doing pd.merge(df1, df2, how='outer', indicator=True, on='days') and I saw only common fields returned back. The missing rows did not get returned..
– dark horse
Jan 3 at 12:39
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54019363%2fpandas-performing-left-join-across-2-dataframe-in-pandas%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
it will because you have done left join, and your left table is
df1
which contains all the days and hence working as it should work!!– Rahul Agarwal
Jan 3 at 9:26
@RahulAgarwal, agree but when date value in df2 does not exist it still returns the rows and copies the date from df1 to df2. For example, as shown in the above example it is seen on Jan 2, there is no value in df2 but it is returned in the new Dataframe
– dark horse
Jan 3 at 9:28
is marked 0 -1 or a date?
– Daniel Mesejo
Jan 3 at 9:34
@DanielMesejo, if it could return a Date it should be fine..
– dark horse
Jan 3 at 9:35