Pandas - Performing left join across 2 Dataframe in Pandas





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







0















I have two dataframes (one that has list of all days in a month and the other than has days when a staff marked attendance) and I am trying to perform a left join such that I have a new Dataframe with all dates and with dates when employee did and did not mark attendance.



Given below is how df1 is:



days
01-01-2018
02-01-2018
03-01-2018
04-01-2018
05-01-2018
06-01-2018
07-01-2018


Given below is how df2 is:



date, emp_id
01-01-2018,101
03-01-2018,101
04-01-2018,101
06-01-2018,101


I am trying to create a new Dataframe as below:



date,marked,emp_id
01-01-2018,01-01-2018,101
02-01-2018,02-01-2018,101
03-01-2018,03-01-2018,101
04-01-2018,04-01-2018,101
05-01-2018,05-01-2018,101
06-01-2018,06-01-2018,101


Days when a value exists in df2, the new Dataframe shall have a valid date if the date exists in df1 and df2 else it should be null. I tried doing the below but I see it returns all dates



new_df = pd.merge(df1, df2,  how='left', left_on=['days'], right_on = ['date'])









share|improve this question

























  • it will because you have done left join, and your left table is df1 which contains all the days and hence working as it should work!!

    – Rahul Agarwal
    Jan 3 at 9:26











  • @RahulAgarwal, agree but when date value in df2 does not exist it still returns the rows and copies the date from df1 to df2. For example, as shown in the above example it is seen on Jan 2, there is no value in df2 but it is returned in the new Dataframe

    – dark horse
    Jan 3 at 9:28











  • is marked 0 -1 or a date?

    – Daniel Mesejo
    Jan 3 at 9:34











  • @DanielMesejo, if it could return a Date it should be fine..

    – dark horse
    Jan 3 at 9:35


















0















I have two dataframes (one that has list of all days in a month and the other than has days when a staff marked attendance) and I am trying to perform a left join such that I have a new Dataframe with all dates and with dates when employee did and did not mark attendance.



Given below is how df1 is:



days
01-01-2018
02-01-2018
03-01-2018
04-01-2018
05-01-2018
06-01-2018
07-01-2018


Given below is how df2 is:



date, emp_id
01-01-2018,101
03-01-2018,101
04-01-2018,101
06-01-2018,101


I am trying to create a new Dataframe as below:



date,marked,emp_id
01-01-2018,01-01-2018,101
02-01-2018,02-01-2018,101
03-01-2018,03-01-2018,101
04-01-2018,04-01-2018,101
05-01-2018,05-01-2018,101
06-01-2018,06-01-2018,101


Days when a value exists in df2, the new Dataframe shall have a valid date if the date exists in df1 and df2 else it should be null. I tried doing the below but I see it returns all dates



new_df = pd.merge(df1, df2,  how='left', left_on=['days'], right_on = ['date'])









share|improve this question

























  • it will because you have done left join, and your left table is df1 which contains all the days and hence working as it should work!!

    – Rahul Agarwal
    Jan 3 at 9:26











  • @RahulAgarwal, agree but when date value in df2 does not exist it still returns the rows and copies the date from df1 to df2. For example, as shown in the above example it is seen on Jan 2, there is no value in df2 but it is returned in the new Dataframe

    – dark horse
    Jan 3 at 9:28











  • is marked 0 -1 or a date?

    – Daniel Mesejo
    Jan 3 at 9:34











  • @DanielMesejo, if it could return a Date it should be fine..

    – dark horse
    Jan 3 at 9:35














0












0








0








I have two dataframes (one that has list of all days in a month and the other than has days when a staff marked attendance) and I am trying to perform a left join such that I have a new Dataframe with all dates and with dates when employee did and did not mark attendance.



Given below is how df1 is:



days
01-01-2018
02-01-2018
03-01-2018
04-01-2018
05-01-2018
06-01-2018
07-01-2018


Given below is how df2 is:



date, emp_id
01-01-2018,101
03-01-2018,101
04-01-2018,101
06-01-2018,101


I am trying to create a new Dataframe as below:



date,marked,emp_id
01-01-2018,01-01-2018,101
02-01-2018,02-01-2018,101
03-01-2018,03-01-2018,101
04-01-2018,04-01-2018,101
05-01-2018,05-01-2018,101
06-01-2018,06-01-2018,101


Days when a value exists in df2, the new Dataframe shall have a valid date if the date exists in df1 and df2 else it should be null. I tried doing the below but I see it returns all dates



new_df = pd.merge(df1, df2,  how='left', left_on=['days'], right_on = ['date'])









share|improve this question
















I have two dataframes (one that has list of all days in a month and the other than has days when a staff marked attendance) and I am trying to perform a left join such that I have a new Dataframe with all dates and with dates when employee did and did not mark attendance.



Given below is how df1 is:



days
01-01-2018
02-01-2018
03-01-2018
04-01-2018
05-01-2018
06-01-2018
07-01-2018


Given below is how df2 is:



date, emp_id
01-01-2018,101
03-01-2018,101
04-01-2018,101
06-01-2018,101


I am trying to create a new Dataframe as below:



date,marked,emp_id
01-01-2018,01-01-2018,101
02-01-2018,02-01-2018,101
03-01-2018,03-01-2018,101
04-01-2018,04-01-2018,101
05-01-2018,05-01-2018,101
06-01-2018,06-01-2018,101


Days when a value exists in df2, the new Dataframe shall have a valid date if the date exists in df1 and df2 else it should be null. I tried doing the below but I see it returns all dates



new_df = pd.merge(df1, df2,  how='left', left_on=['days'], right_on = ['date'])






python pandas dataframe






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 22 at 17:50









Scott Boston

58.3k73258




58.3k73258










asked Jan 3 at 9:21









dark horsedark horse

16210




16210













  • it will because you have done left join, and your left table is df1 which contains all the days and hence working as it should work!!

    – Rahul Agarwal
    Jan 3 at 9:26











  • @RahulAgarwal, agree but when date value in df2 does not exist it still returns the rows and copies the date from df1 to df2. For example, as shown in the above example it is seen on Jan 2, there is no value in df2 but it is returned in the new Dataframe

    – dark horse
    Jan 3 at 9:28











  • is marked 0 -1 or a date?

    – Daniel Mesejo
    Jan 3 at 9:34











  • @DanielMesejo, if it could return a Date it should be fine..

    – dark horse
    Jan 3 at 9:35



















  • it will because you have done left join, and your left table is df1 which contains all the days and hence working as it should work!!

    – Rahul Agarwal
    Jan 3 at 9:26











  • @RahulAgarwal, agree but when date value in df2 does not exist it still returns the rows and copies the date from df1 to df2. For example, as shown in the above example it is seen on Jan 2, there is no value in df2 but it is returned in the new Dataframe

    – dark horse
    Jan 3 at 9:28











  • is marked 0 -1 or a date?

    – Daniel Mesejo
    Jan 3 at 9:34











  • @DanielMesejo, if it could return a Date it should be fine..

    – dark horse
    Jan 3 at 9:35

















it will because you have done left join, and your left table is df1 which contains all the days and hence working as it should work!!

– Rahul Agarwal
Jan 3 at 9:26





it will because you have done left join, and your left table is df1 which contains all the days and hence working as it should work!!

– Rahul Agarwal
Jan 3 at 9:26













@RahulAgarwal, agree but when date value in df2 does not exist it still returns the rows and copies the date from df1 to df2. For example, as shown in the above example it is seen on Jan 2, there is no value in df2 but it is returned in the new Dataframe

– dark horse
Jan 3 at 9:28





@RahulAgarwal, agree but when date value in df2 does not exist it still returns the rows and copies the date from df1 to df2. For example, as shown in the above example it is seen on Jan 2, there is no value in df2 but it is returned in the new Dataframe

– dark horse
Jan 3 at 9:28













is marked 0 -1 or a date?

– Daniel Mesejo
Jan 3 at 9:34





is marked 0 -1 or a date?

– Daniel Mesejo
Jan 3 at 9:34













@DanielMesejo, if it could return a Date it should be fine..

– dark horse
Jan 3 at 9:35





@DanielMesejo, if it could return a Date it should be fine..

– dark horse
Jan 3 at 9:35












1 Answer
1






active

oldest

votes


















1














You could do something like this:



new_df = pd.merge(df1, df2,  how='outer', left_on=['days'], right_on = ['date'])
new_df = new_df.fillna({'emp_id': 101.0})
print(new_df)


Output



        days       date  emp_id
0 2018-01-01 2018-01-01 101.0
1 2018-01-02 NaT 101.0
2 2018-01-03 2018-01-03 101.0
3 2018-01-04 2018-01-04 101.0
4 2018-01-05 NaT 101.0
5 2018-01-06 2018-01-06 101.0
6 2018-01-07 NaT 101.0


If you want a sort of indicator column, do this, instead:



new_df = pd.merge(df1, df2,  how='outer', left_on=['days'], right_on = ['date']).fillna({'emp_id': 101.0})
new_df['marked'] = (new_df.days == new_df.date).astype(np.uint8)
new_df = new_df.drop('date', axis=1)
print(new_df)


Output



        days  emp_id  marked
0 2018-01-01 101.0 1
1 2018-01-02 101.0 0
2 2018-01-03 101.0 1
3 2018-01-04 101.0 1
4 2018-01-05 101.0 0
5 2018-01-06 101.0 1
6 2018-01-07 101.0 0





share|improve this answer
























  • Mesojo, thanks for the feedback, just one concern I have is what if I have a list of emp_id how could I modify the fillna method in the above. Could you please advice.

    – dark horse
    Jan 3 at 9:44











  • @darkhorse - that will depend on your application, there are a few methods, you could see: pandas.pydata.org/pandas-docs/stable/generated/…

    – Daniel Mesejo
    Jan 3 at 9:45













  • thanks will check it out..

    – dark horse
    Jan 3 at 10:00






  • 1





    indicator=True parameter is better here :)

    – jezrael
    Jan 3 at 10:29











  • @jezrael, tried doing pd.merge(df1, df2, how='outer', indicator=True, on='days') and I saw only common fields returned back. The missing rows did not get returned..

    – dark horse
    Jan 3 at 12:39












Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54019363%2fpandas-performing-left-join-across-2-dataframe-in-pandas%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














You could do something like this:



new_df = pd.merge(df1, df2,  how='outer', left_on=['days'], right_on = ['date'])
new_df = new_df.fillna({'emp_id': 101.0})
print(new_df)


Output



        days       date  emp_id
0 2018-01-01 2018-01-01 101.0
1 2018-01-02 NaT 101.0
2 2018-01-03 2018-01-03 101.0
3 2018-01-04 2018-01-04 101.0
4 2018-01-05 NaT 101.0
5 2018-01-06 2018-01-06 101.0
6 2018-01-07 NaT 101.0


If you want a sort of indicator column, do this, instead:



new_df = pd.merge(df1, df2,  how='outer', left_on=['days'], right_on = ['date']).fillna({'emp_id': 101.0})
new_df['marked'] = (new_df.days == new_df.date).astype(np.uint8)
new_df = new_df.drop('date', axis=1)
print(new_df)


Output



        days  emp_id  marked
0 2018-01-01 101.0 1
1 2018-01-02 101.0 0
2 2018-01-03 101.0 1
3 2018-01-04 101.0 1
4 2018-01-05 101.0 0
5 2018-01-06 101.0 1
6 2018-01-07 101.0 0





share|improve this answer
























  • Mesojo, thanks for the feedback, just one concern I have is what if I have a list of emp_id how could I modify the fillna method in the above. Could you please advice.

    – dark horse
    Jan 3 at 9:44











  • @darkhorse - that will depend on your application, there are a few methods, you could see: pandas.pydata.org/pandas-docs/stable/generated/…

    – Daniel Mesejo
    Jan 3 at 9:45













  • thanks will check it out..

    – dark horse
    Jan 3 at 10:00






  • 1





    indicator=True parameter is better here :)

    – jezrael
    Jan 3 at 10:29











  • @jezrael, tried doing pd.merge(df1, df2, how='outer', indicator=True, on='days') and I saw only common fields returned back. The missing rows did not get returned..

    – dark horse
    Jan 3 at 12:39
















1














You could do something like this:



new_df = pd.merge(df1, df2,  how='outer', left_on=['days'], right_on = ['date'])
new_df = new_df.fillna({'emp_id': 101.0})
print(new_df)


Output



        days       date  emp_id
0 2018-01-01 2018-01-01 101.0
1 2018-01-02 NaT 101.0
2 2018-01-03 2018-01-03 101.0
3 2018-01-04 2018-01-04 101.0
4 2018-01-05 NaT 101.0
5 2018-01-06 2018-01-06 101.0
6 2018-01-07 NaT 101.0


If you want a sort of indicator column, do this, instead:



new_df = pd.merge(df1, df2,  how='outer', left_on=['days'], right_on = ['date']).fillna({'emp_id': 101.0})
new_df['marked'] = (new_df.days == new_df.date).astype(np.uint8)
new_df = new_df.drop('date', axis=1)
print(new_df)


Output



        days  emp_id  marked
0 2018-01-01 101.0 1
1 2018-01-02 101.0 0
2 2018-01-03 101.0 1
3 2018-01-04 101.0 1
4 2018-01-05 101.0 0
5 2018-01-06 101.0 1
6 2018-01-07 101.0 0





share|improve this answer
























  • Mesojo, thanks for the feedback, just one concern I have is what if I have a list of emp_id how could I modify the fillna method in the above. Could you please advice.

    – dark horse
    Jan 3 at 9:44











  • @darkhorse - that will depend on your application, there are a few methods, you could see: pandas.pydata.org/pandas-docs/stable/generated/…

    – Daniel Mesejo
    Jan 3 at 9:45













  • thanks will check it out..

    – dark horse
    Jan 3 at 10:00






  • 1





    indicator=True parameter is better here :)

    – jezrael
    Jan 3 at 10:29











  • @jezrael, tried doing pd.merge(df1, df2, how='outer', indicator=True, on='days') and I saw only common fields returned back. The missing rows did not get returned..

    – dark horse
    Jan 3 at 12:39














1












1








1







You could do something like this:



new_df = pd.merge(df1, df2,  how='outer', left_on=['days'], right_on = ['date'])
new_df = new_df.fillna({'emp_id': 101.0})
print(new_df)


Output



        days       date  emp_id
0 2018-01-01 2018-01-01 101.0
1 2018-01-02 NaT 101.0
2 2018-01-03 2018-01-03 101.0
3 2018-01-04 2018-01-04 101.0
4 2018-01-05 NaT 101.0
5 2018-01-06 2018-01-06 101.0
6 2018-01-07 NaT 101.0


If you want a sort of indicator column, do this, instead:



new_df = pd.merge(df1, df2,  how='outer', left_on=['days'], right_on = ['date']).fillna({'emp_id': 101.0})
new_df['marked'] = (new_df.days == new_df.date).astype(np.uint8)
new_df = new_df.drop('date', axis=1)
print(new_df)


Output



        days  emp_id  marked
0 2018-01-01 101.0 1
1 2018-01-02 101.0 0
2 2018-01-03 101.0 1
3 2018-01-04 101.0 1
4 2018-01-05 101.0 0
5 2018-01-06 101.0 1
6 2018-01-07 101.0 0





share|improve this answer













You could do something like this:



new_df = pd.merge(df1, df2,  how='outer', left_on=['days'], right_on = ['date'])
new_df = new_df.fillna({'emp_id': 101.0})
print(new_df)


Output



        days       date  emp_id
0 2018-01-01 2018-01-01 101.0
1 2018-01-02 NaT 101.0
2 2018-01-03 2018-01-03 101.0
3 2018-01-04 2018-01-04 101.0
4 2018-01-05 NaT 101.0
5 2018-01-06 2018-01-06 101.0
6 2018-01-07 NaT 101.0


If you want a sort of indicator column, do this, instead:



new_df = pd.merge(df1, df2,  how='outer', left_on=['days'], right_on = ['date']).fillna({'emp_id': 101.0})
new_df['marked'] = (new_df.days == new_df.date).astype(np.uint8)
new_df = new_df.drop('date', axis=1)
print(new_df)


Output



        days  emp_id  marked
0 2018-01-01 101.0 1
1 2018-01-02 101.0 0
2 2018-01-03 101.0 1
3 2018-01-04 101.0 1
4 2018-01-05 101.0 0
5 2018-01-06 101.0 1
6 2018-01-07 101.0 0






share|improve this answer












share|improve this answer



share|improve this answer










answered Jan 3 at 9:36









Daniel MesejoDaniel Mesejo

18.8k21533




18.8k21533













  • Mesojo, thanks for the feedback, just one concern I have is what if I have a list of emp_id how could I modify the fillna method in the above. Could you please advice.

    – dark horse
    Jan 3 at 9:44











  • @darkhorse - that will depend on your application, there are a few methods, you could see: pandas.pydata.org/pandas-docs/stable/generated/…

    – Daniel Mesejo
    Jan 3 at 9:45













  • thanks will check it out..

    – dark horse
    Jan 3 at 10:00






  • 1





    indicator=True parameter is better here :)

    – jezrael
    Jan 3 at 10:29











  • @jezrael, tried doing pd.merge(df1, df2, how='outer', indicator=True, on='days') and I saw only common fields returned back. The missing rows did not get returned..

    – dark horse
    Jan 3 at 12:39



















  • Mesojo, thanks for the feedback, just one concern I have is what if I have a list of emp_id how could I modify the fillna method in the above. Could you please advice.

    – dark horse
    Jan 3 at 9:44











  • @darkhorse - that will depend on your application, there are a few methods, you could see: pandas.pydata.org/pandas-docs/stable/generated/…

    – Daniel Mesejo
    Jan 3 at 9:45













  • thanks will check it out..

    – dark horse
    Jan 3 at 10:00






  • 1





    indicator=True parameter is better here :)

    – jezrael
    Jan 3 at 10:29











  • @jezrael, tried doing pd.merge(df1, df2, how='outer', indicator=True, on='days') and I saw only common fields returned back. The missing rows did not get returned..

    – dark horse
    Jan 3 at 12:39

















Mesojo, thanks for the feedback, just one concern I have is what if I have a list of emp_id how could I modify the fillna method in the above. Could you please advice.

– dark horse
Jan 3 at 9:44





Mesojo, thanks for the feedback, just one concern I have is what if I have a list of emp_id how could I modify the fillna method in the above. Could you please advice.

– dark horse
Jan 3 at 9:44













@darkhorse - that will depend on your application, there are a few methods, you could see: pandas.pydata.org/pandas-docs/stable/generated/…

– Daniel Mesejo
Jan 3 at 9:45







@darkhorse - that will depend on your application, there are a few methods, you could see: pandas.pydata.org/pandas-docs/stable/generated/…

– Daniel Mesejo
Jan 3 at 9:45















thanks will check it out..

– dark horse
Jan 3 at 10:00





thanks will check it out..

– dark horse
Jan 3 at 10:00




1




1





indicator=True parameter is better here :)

– jezrael
Jan 3 at 10:29





indicator=True parameter is better here :)

– jezrael
Jan 3 at 10:29













@jezrael, tried doing pd.merge(df1, df2, how='outer', indicator=True, on='days') and I saw only common fields returned back. The missing rows did not get returned..

– dark horse
Jan 3 at 12:39





@jezrael, tried doing pd.merge(df1, df2, how='outer', indicator=True, on='days') and I saw only common fields returned back. The missing rows did not get returned..

– dark horse
Jan 3 at 12:39




















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54019363%2fpandas-performing-left-join-across-2-dataframe-in-pandas%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

android studio warns about leanback feature tag usage required on manifest while using Unity exported app?

SQL update select statement

'app-layout' is not a known element: how to share Component with different Modules