Selecting row from a group on highest score based on two columns












2















Data



     Sentence  Score_Unigram  Score_Bigram  versionId
0 As of Dat 5 1 269004158
1 Date Docum 4 3 269004158
2 As of Dat 4 1 269004158
3 Date Docum 5 3 345973060
4 x Indicate 4 1 372529352
5 Date Docum 5 3 372529352
6 1 Financial 9 1 372529352
7 020 per shar 2 0 372529352
8 Date $ in 8 1 372529352
9 Date $ in 9 4 372529352
10 4 --------- 4 1 372529352
11 Date Begin 1 0 372529352


Required Output



       Sentence  Score_Unigram  Score_Bigram  versionId
0 As of Dat 5 1 269004158
3 Date Docum 5 3 345973060
9 Date $ in 9 4 372529352



Objective




Group by version id, get the row with max Score_unigram, if results are more than one, then check the Score_Bigram column and get the row with the highest value (If there are more than one such rows return all)




What have I tried




maximum = 0
index_to_pick =

for index,row_data in a.iterrows():
if row_data['Score_Unigram'] > maximum:
maximum = row_data['Score_Unigram']
score_bigram = row_data['Score_Bigram']
index_to_pick.append(index)

elif row_data['Score_Unigram'] == maximum:
if row_data['Score_Bigram'] > score_bigram:

maximum = row_data['Score_Unigram']
score_bigram = row_data['Score_Bigram']
index_to_pick =
index_to_pick.append(index)

elif row_data['Score_Bigram'] == score_bigram:
index_to_pick.append(index)

a.loc[[index_to_pick[0]]]


Output



       Sentence  Score_Unigram  Score_Bigram  versionId
5 Date $ in 9 4 372529352


Okay the approach is not pretty i guess (since data is large), looking for a efficient one.
I tried idxmax but that returns the only the top one. Might be a duplicate but wasn't able to find one. Thanks for the help!!.










share|improve this question

























  • df.sort_values(['Score_Unigram','Score_Bigram'],ascending=False).head(1) ?

    – anky_91
    Jan 2 at 14:25


















2















Data



     Sentence  Score_Unigram  Score_Bigram  versionId
0 As of Dat 5 1 269004158
1 Date Docum 4 3 269004158
2 As of Dat 4 1 269004158
3 Date Docum 5 3 345973060
4 x Indicate 4 1 372529352
5 Date Docum 5 3 372529352
6 1 Financial 9 1 372529352
7 020 per shar 2 0 372529352
8 Date $ in 8 1 372529352
9 Date $ in 9 4 372529352
10 4 --------- 4 1 372529352
11 Date Begin 1 0 372529352


Required Output



       Sentence  Score_Unigram  Score_Bigram  versionId
0 As of Dat 5 1 269004158
3 Date Docum 5 3 345973060
9 Date $ in 9 4 372529352



Objective




Group by version id, get the row with max Score_unigram, if results are more than one, then check the Score_Bigram column and get the row with the highest value (If there are more than one such rows return all)




What have I tried




maximum = 0
index_to_pick =

for index,row_data in a.iterrows():
if row_data['Score_Unigram'] > maximum:
maximum = row_data['Score_Unigram']
score_bigram = row_data['Score_Bigram']
index_to_pick.append(index)

elif row_data['Score_Unigram'] == maximum:
if row_data['Score_Bigram'] > score_bigram:

maximum = row_data['Score_Unigram']
score_bigram = row_data['Score_Bigram']
index_to_pick =
index_to_pick.append(index)

elif row_data['Score_Bigram'] == score_bigram:
index_to_pick.append(index)

a.loc[[index_to_pick[0]]]


Output



       Sentence  Score_Unigram  Score_Bigram  versionId
5 Date $ in 9 4 372529352


Okay the approach is not pretty i guess (since data is large), looking for a efficient one.
I tried idxmax but that returns the only the top one. Might be a duplicate but wasn't able to find one. Thanks for the help!!.










share|improve this question

























  • df.sort_values(['Score_Unigram','Score_Bigram'],ascending=False).head(1) ?

    – anky_91
    Jan 2 at 14:25
















2












2








2


1






Data



     Sentence  Score_Unigram  Score_Bigram  versionId
0 As of Dat 5 1 269004158
1 Date Docum 4 3 269004158
2 As of Dat 4 1 269004158
3 Date Docum 5 3 345973060
4 x Indicate 4 1 372529352
5 Date Docum 5 3 372529352
6 1 Financial 9 1 372529352
7 020 per shar 2 0 372529352
8 Date $ in 8 1 372529352
9 Date $ in 9 4 372529352
10 4 --------- 4 1 372529352
11 Date Begin 1 0 372529352


Required Output



       Sentence  Score_Unigram  Score_Bigram  versionId
0 As of Dat 5 1 269004158
3 Date Docum 5 3 345973060
9 Date $ in 9 4 372529352



Objective




Group by version id, get the row with max Score_unigram, if results are more than one, then check the Score_Bigram column and get the row with the highest value (If there are more than one such rows return all)




What have I tried




maximum = 0
index_to_pick =

for index,row_data in a.iterrows():
if row_data['Score_Unigram'] > maximum:
maximum = row_data['Score_Unigram']
score_bigram = row_data['Score_Bigram']
index_to_pick.append(index)

elif row_data['Score_Unigram'] == maximum:
if row_data['Score_Bigram'] > score_bigram:

maximum = row_data['Score_Unigram']
score_bigram = row_data['Score_Bigram']
index_to_pick =
index_to_pick.append(index)

elif row_data['Score_Bigram'] == score_bigram:
index_to_pick.append(index)

a.loc[[index_to_pick[0]]]


Output



       Sentence  Score_Unigram  Score_Bigram  versionId
5 Date $ in 9 4 372529352


Okay the approach is not pretty i guess (since data is large), looking for a efficient one.
I tried idxmax but that returns the only the top one. Might be a duplicate but wasn't able to find one. Thanks for the help!!.










share|improve this question
















Data



     Sentence  Score_Unigram  Score_Bigram  versionId
0 As of Dat 5 1 269004158
1 Date Docum 4 3 269004158
2 As of Dat 4 1 269004158
3 Date Docum 5 3 345973060
4 x Indicate 4 1 372529352
5 Date Docum 5 3 372529352
6 1 Financial 9 1 372529352
7 020 per shar 2 0 372529352
8 Date $ in 8 1 372529352
9 Date $ in 9 4 372529352
10 4 --------- 4 1 372529352
11 Date Begin 1 0 372529352


Required Output



       Sentence  Score_Unigram  Score_Bigram  versionId
0 As of Dat 5 1 269004158
3 Date Docum 5 3 345973060
9 Date $ in 9 4 372529352



Objective




Group by version id, get the row with max Score_unigram, if results are more than one, then check the Score_Bigram column and get the row with the highest value (If there are more than one such rows return all)




What have I tried




maximum = 0
index_to_pick =

for index,row_data in a.iterrows():
if row_data['Score_Unigram'] > maximum:
maximum = row_data['Score_Unigram']
score_bigram = row_data['Score_Bigram']
index_to_pick.append(index)

elif row_data['Score_Unigram'] == maximum:
if row_data['Score_Bigram'] > score_bigram:

maximum = row_data['Score_Unigram']
score_bigram = row_data['Score_Bigram']
index_to_pick =
index_to_pick.append(index)

elif row_data['Score_Bigram'] == score_bigram:
index_to_pick.append(index)

a.loc[[index_to_pick[0]]]


Output



       Sentence  Score_Unigram  Score_Bigram  versionId
5 Date $ in 9 4 372529352


Okay the approach is not pretty i guess (since data is large), looking for a efficient one.
I tried idxmax but that returns the only the top one. Might be a duplicate but wasn't able to find one. Thanks for the help!!.







python pandas






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 2 at 14:44







iamklaus

















asked Jan 2 at 14:20









iamklausiamklaus

1,4361511




1,4361511













  • df.sort_values(['Score_Unigram','Score_Bigram'],ascending=False).head(1) ?

    – anky_91
    Jan 2 at 14:25





















  • df.sort_values(['Score_Unigram','Score_Bigram'],ascending=False).head(1) ?

    – anky_91
    Jan 2 at 14:25



















df.sort_values(['Score_Unigram','Score_Bigram'],ascending=False).head(1) ?

– anky_91
Jan 2 at 14:25







df.sort_values(['Score_Unigram','Score_Bigram'],ascending=False).head(1) ?

– anky_91
Jan 2 at 14:25














3 Answers
3






active

oldest

votes


















2














Use double filtering by boolean indexing - first by max of first column Score_Unigram and then by Score_Bigram:



df = df[ df['Sentence'].duplicated(keep=False)]
df = df[df.groupby('Sentence')['Score_Unigram'].transform('max') == df['Score_Unigram']]
df = df[df.groupby(['Sentence', 'Score_Unigram'])['Score_Bigram'].transform('max') == df['Score_Bigram']]
print (df)
Sentence Score_Unigram Score_Bigram versionId
0 As of Dat 5 1 269004158
3 Date Docum 5 3 345973060
5 Date Docum 5 3 372529352
9 Date $ in 9 4 372529352





share|improve this answer


























  • hey thanks for the reponse, but this might not work ( correct me if i am wrong) for groups right, if there are multiple version ids.. (sorry updated the ques now)

    – iamklaus
    Jan 2 at 14:45











  • @iamklaus - Is output correct? Check edited answer.

    – jezrael
    Jan 2 at 14:51






  • 1





    thanks works smoothly (small thing groupby had to be done on versionId but anyways works fine, i made the changes for myself)

    – iamklaus
    Jan 2 at 14:55



















1














try this on your df :



df.sort_values(['Score_Unigram','Score_Bigram'],ascending=False).head(1)



Output:



    Sentence     Score_Unigram  Score_Bigram  versionId
5 Date $ in 9 4 372529352





share|improve this answer































    1














    I believe you don't need to sort data, just compare to the max value of those 2 columns



    df[ (df['Score_Unigram'] == df['Score_Unigram'].max()) & 
    (df['Score_Bigram'] == df['Score_Bigram'].max()) ]





    share|improve this answer


























    • It is wrong, check my data.

      – jezrael
      Jan 2 at 14:42











    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54007984%2fselecting-row-from-a-group-on-highest-score-based-on-two-columns%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    3 Answers
    3






    active

    oldest

    votes








    3 Answers
    3






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    2














    Use double filtering by boolean indexing - first by max of first column Score_Unigram and then by Score_Bigram:



    df = df[ df['Sentence'].duplicated(keep=False)]
    df = df[df.groupby('Sentence')['Score_Unigram'].transform('max') == df['Score_Unigram']]
    df = df[df.groupby(['Sentence', 'Score_Unigram'])['Score_Bigram'].transform('max') == df['Score_Bigram']]
    print (df)
    Sentence Score_Unigram Score_Bigram versionId
    0 As of Dat 5 1 269004158
    3 Date Docum 5 3 345973060
    5 Date Docum 5 3 372529352
    9 Date $ in 9 4 372529352





    share|improve this answer


























    • hey thanks for the reponse, but this might not work ( correct me if i am wrong) for groups right, if there are multiple version ids.. (sorry updated the ques now)

      – iamklaus
      Jan 2 at 14:45











    • @iamklaus - Is output correct? Check edited answer.

      – jezrael
      Jan 2 at 14:51






    • 1





      thanks works smoothly (small thing groupby had to be done on versionId but anyways works fine, i made the changes for myself)

      – iamklaus
      Jan 2 at 14:55
















    2














    Use double filtering by boolean indexing - first by max of first column Score_Unigram and then by Score_Bigram:



    df = df[ df['Sentence'].duplicated(keep=False)]
    df = df[df.groupby('Sentence')['Score_Unigram'].transform('max') == df['Score_Unigram']]
    df = df[df.groupby(['Sentence', 'Score_Unigram'])['Score_Bigram'].transform('max') == df['Score_Bigram']]
    print (df)
    Sentence Score_Unigram Score_Bigram versionId
    0 As of Dat 5 1 269004158
    3 Date Docum 5 3 345973060
    5 Date Docum 5 3 372529352
    9 Date $ in 9 4 372529352





    share|improve this answer


























    • hey thanks for the reponse, but this might not work ( correct me if i am wrong) for groups right, if there are multiple version ids.. (sorry updated the ques now)

      – iamklaus
      Jan 2 at 14:45











    • @iamklaus - Is output correct? Check edited answer.

      – jezrael
      Jan 2 at 14:51






    • 1





      thanks works smoothly (small thing groupby had to be done on versionId but anyways works fine, i made the changes for myself)

      – iamklaus
      Jan 2 at 14:55














    2












    2








    2







    Use double filtering by boolean indexing - first by max of first column Score_Unigram and then by Score_Bigram:



    df = df[ df['Sentence'].duplicated(keep=False)]
    df = df[df.groupby('Sentence')['Score_Unigram'].transform('max') == df['Score_Unigram']]
    df = df[df.groupby(['Sentence', 'Score_Unigram'])['Score_Bigram'].transform('max') == df['Score_Bigram']]
    print (df)
    Sentence Score_Unigram Score_Bigram versionId
    0 As of Dat 5 1 269004158
    3 Date Docum 5 3 345973060
    5 Date Docum 5 3 372529352
    9 Date $ in 9 4 372529352





    share|improve this answer















    Use double filtering by boolean indexing - first by max of first column Score_Unigram and then by Score_Bigram:



    df = df[ df['Sentence'].duplicated(keep=False)]
    df = df[df.groupby('Sentence')['Score_Unigram'].transform('max') == df['Score_Unigram']]
    df = df[df.groupby(['Sentence', 'Score_Unigram'])['Score_Bigram'].transform('max') == df['Score_Bigram']]
    print (df)
    Sentence Score_Unigram Score_Bigram versionId
    0 As of Dat 5 1 269004158
    3 Date Docum 5 3 345973060
    5 Date Docum 5 3 372529352
    9 Date $ in 9 4 372529352






    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Jan 2 at 14:50

























    answered Jan 2 at 14:36









    jezraeljezrael

    351k26314389




    351k26314389













    • hey thanks for the reponse, but this might not work ( correct me if i am wrong) for groups right, if there are multiple version ids.. (sorry updated the ques now)

      – iamklaus
      Jan 2 at 14:45











    • @iamklaus - Is output correct? Check edited answer.

      – jezrael
      Jan 2 at 14:51






    • 1





      thanks works smoothly (small thing groupby had to be done on versionId but anyways works fine, i made the changes for myself)

      – iamklaus
      Jan 2 at 14:55



















    • hey thanks for the reponse, but this might not work ( correct me if i am wrong) for groups right, if there are multiple version ids.. (sorry updated the ques now)

      – iamklaus
      Jan 2 at 14:45











    • @iamklaus - Is output correct? Check edited answer.

      – jezrael
      Jan 2 at 14:51






    • 1





      thanks works smoothly (small thing groupby had to be done on versionId but anyways works fine, i made the changes for myself)

      – iamklaus
      Jan 2 at 14:55

















    hey thanks for the reponse, but this might not work ( correct me if i am wrong) for groups right, if there are multiple version ids.. (sorry updated the ques now)

    – iamklaus
    Jan 2 at 14:45





    hey thanks for the reponse, but this might not work ( correct me if i am wrong) for groups right, if there are multiple version ids.. (sorry updated the ques now)

    – iamklaus
    Jan 2 at 14:45













    @iamklaus - Is output correct? Check edited answer.

    – jezrael
    Jan 2 at 14:51





    @iamklaus - Is output correct? Check edited answer.

    – jezrael
    Jan 2 at 14:51




    1




    1





    thanks works smoothly (small thing groupby had to be done on versionId but anyways works fine, i made the changes for myself)

    – iamklaus
    Jan 2 at 14:55





    thanks works smoothly (small thing groupby had to be done on versionId but anyways works fine, i made the changes for myself)

    – iamklaus
    Jan 2 at 14:55













    1














    try this on your df :



    df.sort_values(['Score_Unigram','Score_Bigram'],ascending=False).head(1)



    Output:



        Sentence     Score_Unigram  Score_Bigram  versionId
    5 Date $ in 9 4 372529352





    share|improve this answer




























      1














      try this on your df :



      df.sort_values(['Score_Unigram','Score_Bigram'],ascending=False).head(1)



      Output:



          Sentence     Score_Unigram  Score_Bigram  versionId
      5 Date $ in 9 4 372529352





      share|improve this answer


























        1












        1








        1







        try this on your df :



        df.sort_values(['Score_Unigram','Score_Bigram'],ascending=False).head(1)



        Output:



            Sentence     Score_Unigram  Score_Bigram  versionId
        5 Date $ in 9 4 372529352





        share|improve this answer













        try this on your df :



        df.sort_values(['Score_Unigram','Score_Bigram'],ascending=False).head(1)



        Output:



            Sentence     Score_Unigram  Score_Bigram  versionId
        5 Date $ in 9 4 372529352






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Jan 2 at 14:30









        anky_91anky_91

        9,6592922




        9,6592922























            1














            I believe you don't need to sort data, just compare to the max value of those 2 columns



            df[ (df['Score_Unigram'] == df['Score_Unigram'].max()) & 
            (df['Score_Bigram'] == df['Score_Bigram'].max()) ]





            share|improve this answer


























            • It is wrong, check my data.

              – jezrael
              Jan 2 at 14:42
















            1














            I believe you don't need to sort data, just compare to the max value of those 2 columns



            df[ (df['Score_Unigram'] == df['Score_Unigram'].max()) & 
            (df['Score_Bigram'] == df['Score_Bigram'].max()) ]





            share|improve this answer


























            • It is wrong, check my data.

              – jezrael
              Jan 2 at 14:42














            1












            1








            1







            I believe you don't need to sort data, just compare to the max value of those 2 columns



            df[ (df['Score_Unigram'] == df['Score_Unigram'].max()) & 
            (df['Score_Bigram'] == df['Score_Bigram'].max()) ]





            share|improve this answer















            I believe you don't need to sort data, just compare to the max value of those 2 columns



            df[ (df['Score_Unigram'] == df['Score_Unigram'].max()) & 
            (df['Score_Bigram'] == df['Score_Bigram'].max()) ]






            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Jan 2 at 14:39

























            answered Jan 2 at 14:34









            TerryTerry

            452412




            452412













            • It is wrong, check my data.

              – jezrael
              Jan 2 at 14:42



















            • It is wrong, check my data.

              – jezrael
              Jan 2 at 14:42

















            It is wrong, check my data.

            – jezrael
            Jan 2 at 14:42





            It is wrong, check my data.

            – jezrael
            Jan 2 at 14:42


















            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54007984%2fselecting-row-from-a-group-on-highest-score-based-on-two-columns%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            MongoDB - Not Authorized To Execute Command

            How to fix TextFormField cause rebuild widget in Flutter

            Npm cannot find a required file even through it is in the searched directory