pandas to find earliest occurrence of statement and set to starter











up vote
2
down vote

favorite












Consider the following df



  data = {'Name' : ['John','John','Lucy','Lucy','Lucy'],
'Payroll' : [15,15,75,75,75],
'Week' : [1,2,1,2,3]}
df = pd.DataFrame(data)

Name Payroll Week
0 John 15 1
1 John 15 2
2 Lucy 75 1
3 Lucy 75 2
4 Lucy 75 3


What I'm attempting to do is true apply a Boolean throughout a DataFrame very similar to this one with 2m+ rows and 20+ columns to find out when someone started



To find if someone is active or not I pass a condition to another df:



df2 = df.loc[df.Week == df.Week.max()]


This gives me the final week i then use an isin filter to find out if the person is active or has left



df['Status'] = np.where(df['Payroll'].isin(df2['Payroll']), 'Active','Leaver')


So using the above code I get the following which is great, which tells me that since John is not in the latest week he has left the company



Name    Payroll Week    Status
0 John 15 1 Leaver
1 John 15 2 Leaver
2 Lucy 75 1 Active
3 Lucy 75 2 Active
4 Lucy 75 3 Active


What I'm trying to achieve is to know when John started with us, I could try a mask for each week of the year and an isin to check for when they first appeared but I figured there must be a more pythonic way do this!



Desired output:



 Name   Payroll Week    Status
0 John 15 1 Starter
1 John 15 2 Leaver
2 Lucy 75 1 Starter
3 Lucy 75 2 Active
4 Lucy 75 3 Active


Any help is much appreciated.



Edit for Clarity :



data = {'Name' : ['John','John','John','John','Lucy','Lucy','Lucy','Lucy','Lucy'],
'Payroll' : [15,15,15,15,75,75,75,75,75],
'Week' : [1,2,3,4,1,2,3,4,5]}

df = pd.DataFrame(data)


desired output:



    Name    Payroll Week    Status
0 John 15 1 Starter
1 John 15 2 Active
2 John 15 3 Active
3 John 15 4 Leaver
4 Lucy 75 1 Starter
5 Lucy 75 2 Active
6 Lucy 75 3 Active
7 Lucy 75 4 Active
8 Lucy 75 5 Active


things to note:



Max week is 5 so anyone not in week 5 is a leaver



first week of person in df makes them a starter.



all weeks in between are set to Active.










share|improve this question




























    up vote
    2
    down vote

    favorite












    Consider the following df



      data = {'Name' : ['John','John','Lucy','Lucy','Lucy'],
    'Payroll' : [15,15,75,75,75],
    'Week' : [1,2,1,2,3]}
    df = pd.DataFrame(data)

    Name Payroll Week
    0 John 15 1
    1 John 15 2
    2 Lucy 75 1
    3 Lucy 75 2
    4 Lucy 75 3


    What I'm attempting to do is true apply a Boolean throughout a DataFrame very similar to this one with 2m+ rows and 20+ columns to find out when someone started



    To find if someone is active or not I pass a condition to another df:



    df2 = df.loc[df.Week == df.Week.max()]


    This gives me the final week i then use an isin filter to find out if the person is active or has left



    df['Status'] = np.where(df['Payroll'].isin(df2['Payroll']), 'Active','Leaver')


    So using the above code I get the following which is great, which tells me that since John is not in the latest week he has left the company



    Name    Payroll Week    Status
    0 John 15 1 Leaver
    1 John 15 2 Leaver
    2 Lucy 75 1 Active
    3 Lucy 75 2 Active
    4 Lucy 75 3 Active


    What I'm trying to achieve is to know when John started with us, I could try a mask for each week of the year and an isin to check for when they first appeared but I figured there must be a more pythonic way do this!



    Desired output:



     Name   Payroll Week    Status
    0 John 15 1 Starter
    1 John 15 2 Leaver
    2 Lucy 75 1 Starter
    3 Lucy 75 2 Active
    4 Lucy 75 3 Active


    Any help is much appreciated.



    Edit for Clarity :



    data = {'Name' : ['John','John','John','John','Lucy','Lucy','Lucy','Lucy','Lucy'],
    'Payroll' : [15,15,15,15,75,75,75,75,75],
    'Week' : [1,2,3,4,1,2,3,4,5]}

    df = pd.DataFrame(data)


    desired output:



        Name    Payroll Week    Status
    0 John 15 1 Starter
    1 John 15 2 Active
    2 John 15 3 Active
    3 John 15 4 Leaver
    4 Lucy 75 1 Starter
    5 Lucy 75 2 Active
    6 Lucy 75 3 Active
    7 Lucy 75 4 Active
    8 Lucy 75 5 Active


    things to note:



    Max week is 5 so anyone not in week 5 is a leaver



    first week of person in df makes them a starter.



    all weeks in between are set to Active.










    share|improve this question


























      up vote
      2
      down vote

      favorite









      up vote
      2
      down vote

      favorite











      Consider the following df



        data = {'Name' : ['John','John','Lucy','Lucy','Lucy'],
      'Payroll' : [15,15,75,75,75],
      'Week' : [1,2,1,2,3]}
      df = pd.DataFrame(data)

      Name Payroll Week
      0 John 15 1
      1 John 15 2
      2 Lucy 75 1
      3 Lucy 75 2
      4 Lucy 75 3


      What I'm attempting to do is true apply a Boolean throughout a DataFrame very similar to this one with 2m+ rows and 20+ columns to find out when someone started



      To find if someone is active or not I pass a condition to another df:



      df2 = df.loc[df.Week == df.Week.max()]


      This gives me the final week i then use an isin filter to find out if the person is active or has left



      df['Status'] = np.where(df['Payroll'].isin(df2['Payroll']), 'Active','Leaver')


      So using the above code I get the following which is great, which tells me that since John is not in the latest week he has left the company



      Name    Payroll Week    Status
      0 John 15 1 Leaver
      1 John 15 2 Leaver
      2 Lucy 75 1 Active
      3 Lucy 75 2 Active
      4 Lucy 75 3 Active


      What I'm trying to achieve is to know when John started with us, I could try a mask for each week of the year and an isin to check for when they first appeared but I figured there must be a more pythonic way do this!



      Desired output:



       Name   Payroll Week    Status
      0 John 15 1 Starter
      1 John 15 2 Leaver
      2 Lucy 75 1 Starter
      3 Lucy 75 2 Active
      4 Lucy 75 3 Active


      Any help is much appreciated.



      Edit for Clarity :



      data = {'Name' : ['John','John','John','John','Lucy','Lucy','Lucy','Lucy','Lucy'],
      'Payroll' : [15,15,15,15,75,75,75,75,75],
      'Week' : [1,2,3,4,1,2,3,4,5]}

      df = pd.DataFrame(data)


      desired output:



          Name    Payroll Week    Status
      0 John 15 1 Starter
      1 John 15 2 Active
      2 John 15 3 Active
      3 John 15 4 Leaver
      4 Lucy 75 1 Starter
      5 Lucy 75 2 Active
      6 Lucy 75 3 Active
      7 Lucy 75 4 Active
      8 Lucy 75 5 Active


      things to note:



      Max week is 5 so anyone not in week 5 is a leaver



      first week of person in df makes them a starter.



      all weeks in between are set to Active.










      share|improve this question















      Consider the following df



        data = {'Name' : ['John','John','Lucy','Lucy','Lucy'],
      'Payroll' : [15,15,75,75,75],
      'Week' : [1,2,1,2,3]}
      df = pd.DataFrame(data)

      Name Payroll Week
      0 John 15 1
      1 John 15 2
      2 Lucy 75 1
      3 Lucy 75 2
      4 Lucy 75 3


      What I'm attempting to do is true apply a Boolean throughout a DataFrame very similar to this one with 2m+ rows and 20+ columns to find out when someone started



      To find if someone is active or not I pass a condition to another df:



      df2 = df.loc[df.Week == df.Week.max()]


      This gives me the final week i then use an isin filter to find out if the person is active or has left



      df['Status'] = np.where(df['Payroll'].isin(df2['Payroll']), 'Active','Leaver')


      So using the above code I get the following which is great, which tells me that since John is not in the latest week he has left the company



      Name    Payroll Week    Status
      0 John 15 1 Leaver
      1 John 15 2 Leaver
      2 Lucy 75 1 Active
      3 Lucy 75 2 Active
      4 Lucy 75 3 Active


      What I'm trying to achieve is to know when John started with us, I could try a mask for each week of the year and an isin to check for when they first appeared but I figured there must be a more pythonic way do this!



      Desired output:



       Name   Payroll Week    Status
      0 John 15 1 Starter
      1 John 15 2 Leaver
      2 Lucy 75 1 Starter
      3 Lucy 75 2 Active
      4 Lucy 75 3 Active


      Any help is much appreciated.



      Edit for Clarity :



      data = {'Name' : ['John','John','John','John','Lucy','Lucy','Lucy','Lucy','Lucy'],
      'Payroll' : [15,15,15,15,75,75,75,75,75],
      'Week' : [1,2,3,4,1,2,3,4,5]}

      df = pd.DataFrame(data)


      desired output:



          Name    Payroll Week    Status
      0 John 15 1 Starter
      1 John 15 2 Active
      2 John 15 3 Active
      3 John 15 4 Leaver
      4 Lucy 75 1 Starter
      5 Lucy 75 2 Active
      6 Lucy 75 3 Active
      7 Lucy 75 4 Active
      8 Lucy 75 5 Active


      things to note:



      Max week is 5 so anyone not in week 5 is a leaver



      first week of person in df makes them a starter.



      all weeks in between are set to Active.







      python pandas






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited yesterday

























      asked yesterday









      Datanovice

      508211




      508211
























          2 Answers
          2






          active

          oldest

          votes

















          up vote
          3
          down vote



          accepted










          Use numpy.select with new condition by duplicated:



          a = df.loc[df.Week == df.Week.max(), 'Payroll']
          m1 = ~df['Payroll'].isin(a)
          m2 = ~df['Payroll'].duplicated()
          m3 = ~df['Payroll'].duplicated(keep='last')

          df['Status'] = np.select([m2, m1 & m3], ['Starter', 'Leaver'], 'Active')
          print (df)
          Name Payroll Week Status
          0 John 15 1 Starter
          1 John 15 2 Active
          2 John 15 3 Active
          3 John 15 4 Leaver
          4 Lucy 75 1 Starter
          5 Lucy 75 2 Active
          6 Lucy 75 3 Active
          7 Lucy 75 4 Active
          8 Lucy 75 5 Active





          share|improve this answer























          • Hey Jozi! this is close but Lucy would be set as Active as she's present in the df in the final week
            – Datanovice
            yesterday






          • 1




            I don't think so as it's possible for people to have the same names, however the payroll column has unique values per person
            – Datanovice
            yesterday






          • 1




            @Datanovice - exactly, you are right.
            – jezrael
            yesterday






          • 1




            @Datanovice - I think the best should be changed sample data, is it possible? Can you add some 2-3 new rows?
            – jezrael
            yesterday






          • 1




            beautiful, you are the man!
            – Datanovice
            yesterday


















          up vote
          2
          down vote













          The simplest way that I have come across is using groupby and finding minimal index for the name in the group:



          for _, dfg in df.groupby(df['Name']):
          gidx = min(dfg.index)
          df.loc[df.index == gidx,'Status'] = 'Starter'

          print(df)


          And the df is then:



             Name  Payroll  Week   Status
          0 John 15 1 Starter
          1 John 15 2 Leaver
          2 Lucy 75 1 Starter
          3 Lucy 75 2 Active
          4 Lucy 75 3 Active





          share|improve this answer























          • awesome! let me test this, currently running on my lovely work laptop!
            – Datanovice
            yesterday










          • Mind the correction (proper indexing this time) :)
            – sophros
            yesterday










          • additionally, could this be used to find the last occurrence of an item in a df? I'd like to note someone as active until they are a leaver in their max week if its less than the dataframes max week
            – Datanovice
            yesterday










          • and final comment, anyway to do this without a for loop, I've always been reprimanded here for using them as pandas uses 2d vector arrays and doesn't work well with loops (especially as my df is 2million rows +)
            – Datanovice
            yesterday






          • 1




            You can change the min to max and you have the last entry. With this the changes are minimal. In terms of the for loop there is another approach I will post in a moment.
            – sophros
            yesterday











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














           

          draft saved


          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53372676%2fpandas-to-find-earliest-occurrence-of-statement-and-set-to-starter%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          2 Answers
          2






          active

          oldest

          votes








          2 Answers
          2






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          3
          down vote



          accepted










          Use numpy.select with new condition by duplicated:



          a = df.loc[df.Week == df.Week.max(), 'Payroll']
          m1 = ~df['Payroll'].isin(a)
          m2 = ~df['Payroll'].duplicated()
          m3 = ~df['Payroll'].duplicated(keep='last')

          df['Status'] = np.select([m2, m1 & m3], ['Starter', 'Leaver'], 'Active')
          print (df)
          Name Payroll Week Status
          0 John 15 1 Starter
          1 John 15 2 Active
          2 John 15 3 Active
          3 John 15 4 Leaver
          4 Lucy 75 1 Starter
          5 Lucy 75 2 Active
          6 Lucy 75 3 Active
          7 Lucy 75 4 Active
          8 Lucy 75 5 Active





          share|improve this answer























          • Hey Jozi! this is close but Lucy would be set as Active as she's present in the df in the final week
            – Datanovice
            yesterday






          • 1




            I don't think so as it's possible for people to have the same names, however the payroll column has unique values per person
            – Datanovice
            yesterday






          • 1




            @Datanovice - exactly, you are right.
            – jezrael
            yesterday






          • 1




            @Datanovice - I think the best should be changed sample data, is it possible? Can you add some 2-3 new rows?
            – jezrael
            yesterday






          • 1




            beautiful, you are the man!
            – Datanovice
            yesterday















          up vote
          3
          down vote



          accepted










          Use numpy.select with new condition by duplicated:



          a = df.loc[df.Week == df.Week.max(), 'Payroll']
          m1 = ~df['Payroll'].isin(a)
          m2 = ~df['Payroll'].duplicated()
          m3 = ~df['Payroll'].duplicated(keep='last')

          df['Status'] = np.select([m2, m1 & m3], ['Starter', 'Leaver'], 'Active')
          print (df)
          Name Payroll Week Status
          0 John 15 1 Starter
          1 John 15 2 Active
          2 John 15 3 Active
          3 John 15 4 Leaver
          4 Lucy 75 1 Starter
          5 Lucy 75 2 Active
          6 Lucy 75 3 Active
          7 Lucy 75 4 Active
          8 Lucy 75 5 Active





          share|improve this answer























          • Hey Jozi! this is close but Lucy would be set as Active as she's present in the df in the final week
            – Datanovice
            yesterday






          • 1




            I don't think so as it's possible for people to have the same names, however the payroll column has unique values per person
            – Datanovice
            yesterday






          • 1




            @Datanovice - exactly, you are right.
            – jezrael
            yesterday






          • 1




            @Datanovice - I think the best should be changed sample data, is it possible? Can you add some 2-3 new rows?
            – jezrael
            yesterday






          • 1




            beautiful, you are the man!
            – Datanovice
            yesterday













          up vote
          3
          down vote



          accepted







          up vote
          3
          down vote



          accepted






          Use numpy.select with new condition by duplicated:



          a = df.loc[df.Week == df.Week.max(), 'Payroll']
          m1 = ~df['Payroll'].isin(a)
          m2 = ~df['Payroll'].duplicated()
          m3 = ~df['Payroll'].duplicated(keep='last')

          df['Status'] = np.select([m2, m1 & m3], ['Starter', 'Leaver'], 'Active')
          print (df)
          Name Payroll Week Status
          0 John 15 1 Starter
          1 John 15 2 Active
          2 John 15 3 Active
          3 John 15 4 Leaver
          4 Lucy 75 1 Starter
          5 Lucy 75 2 Active
          6 Lucy 75 3 Active
          7 Lucy 75 4 Active
          8 Lucy 75 5 Active





          share|improve this answer














          Use numpy.select with new condition by duplicated:



          a = df.loc[df.Week == df.Week.max(), 'Payroll']
          m1 = ~df['Payroll'].isin(a)
          m2 = ~df['Payroll'].duplicated()
          m3 = ~df['Payroll'].duplicated(keep='last')

          df['Status'] = np.select([m2, m1 & m3], ['Starter', 'Leaver'], 'Active')
          print (df)
          Name Payroll Week Status
          0 John 15 1 Starter
          1 John 15 2 Active
          2 John 15 3 Active
          3 John 15 4 Leaver
          4 Lucy 75 1 Starter
          5 Lucy 75 2 Active
          6 Lucy 75 3 Active
          7 Lucy 75 4 Active
          8 Lucy 75 5 Active






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited yesterday

























          answered yesterday









          jezrael

          307k20243317




          307k20243317












          • Hey Jozi! this is close but Lucy would be set as Active as she's present in the df in the final week
            – Datanovice
            yesterday






          • 1




            I don't think so as it's possible for people to have the same names, however the payroll column has unique values per person
            – Datanovice
            yesterday






          • 1




            @Datanovice - exactly, you are right.
            – jezrael
            yesterday






          • 1




            @Datanovice - I think the best should be changed sample data, is it possible? Can you add some 2-3 new rows?
            – jezrael
            yesterday






          • 1




            beautiful, you are the man!
            – Datanovice
            yesterday


















          • Hey Jozi! this is close but Lucy would be set as Active as she's present in the df in the final week
            – Datanovice
            yesterday






          • 1




            I don't think so as it's possible for people to have the same names, however the payroll column has unique values per person
            – Datanovice
            yesterday






          • 1




            @Datanovice - exactly, you are right.
            – jezrael
            yesterday






          • 1




            @Datanovice - I think the best should be changed sample data, is it possible? Can you add some 2-3 new rows?
            – jezrael
            yesterday






          • 1




            beautiful, you are the man!
            – Datanovice
            yesterday
















          Hey Jozi! this is close but Lucy would be set as Active as she's present in the df in the final week
          – Datanovice
          yesterday




          Hey Jozi! this is close but Lucy would be set as Active as she's present in the df in the final week
          – Datanovice
          yesterday




          1




          1




          I don't think so as it's possible for people to have the same names, however the payroll column has unique values per person
          – Datanovice
          yesterday




          I don't think so as it's possible for people to have the same names, however the payroll column has unique values per person
          – Datanovice
          yesterday




          1




          1




          @Datanovice - exactly, you are right.
          – jezrael
          yesterday




          @Datanovice - exactly, you are right.
          – jezrael
          yesterday




          1




          1




          @Datanovice - I think the best should be changed sample data, is it possible? Can you add some 2-3 new rows?
          – jezrael
          yesterday




          @Datanovice - I think the best should be changed sample data, is it possible? Can you add some 2-3 new rows?
          – jezrael
          yesterday




          1




          1




          beautiful, you are the man!
          – Datanovice
          yesterday




          beautiful, you are the man!
          – Datanovice
          yesterday












          up vote
          2
          down vote













          The simplest way that I have come across is using groupby and finding minimal index for the name in the group:



          for _, dfg in df.groupby(df['Name']):
          gidx = min(dfg.index)
          df.loc[df.index == gidx,'Status'] = 'Starter'

          print(df)


          And the df is then:



             Name  Payroll  Week   Status
          0 John 15 1 Starter
          1 John 15 2 Leaver
          2 Lucy 75 1 Starter
          3 Lucy 75 2 Active
          4 Lucy 75 3 Active





          share|improve this answer























          • awesome! let me test this, currently running on my lovely work laptop!
            – Datanovice
            yesterday










          • Mind the correction (proper indexing this time) :)
            – sophros
            yesterday










          • additionally, could this be used to find the last occurrence of an item in a df? I'd like to note someone as active until they are a leaver in their max week if its less than the dataframes max week
            – Datanovice
            yesterday










          • and final comment, anyway to do this without a for loop, I've always been reprimanded here for using them as pandas uses 2d vector arrays and doesn't work well with loops (especially as my df is 2million rows +)
            – Datanovice
            yesterday






          • 1




            You can change the min to max and you have the last entry. With this the changes are minimal. In terms of the for loop there is another approach I will post in a moment.
            – sophros
            yesterday















          up vote
          2
          down vote













          The simplest way that I have come across is using groupby and finding minimal index for the name in the group:



          for _, dfg in df.groupby(df['Name']):
          gidx = min(dfg.index)
          df.loc[df.index == gidx,'Status'] = 'Starter'

          print(df)


          And the df is then:



             Name  Payroll  Week   Status
          0 John 15 1 Starter
          1 John 15 2 Leaver
          2 Lucy 75 1 Starter
          3 Lucy 75 2 Active
          4 Lucy 75 3 Active





          share|improve this answer























          • awesome! let me test this, currently running on my lovely work laptop!
            – Datanovice
            yesterday










          • Mind the correction (proper indexing this time) :)
            – sophros
            yesterday










          • additionally, could this be used to find the last occurrence of an item in a df? I'd like to note someone as active until they are a leaver in their max week if its less than the dataframes max week
            – Datanovice
            yesterday










          • and final comment, anyway to do this without a for loop, I've always been reprimanded here for using them as pandas uses 2d vector arrays and doesn't work well with loops (especially as my df is 2million rows +)
            – Datanovice
            yesterday






          • 1




            You can change the min to max and you have the last entry. With this the changes are minimal. In terms of the for loop there is another approach I will post in a moment.
            – sophros
            yesterday













          up vote
          2
          down vote










          up vote
          2
          down vote









          The simplest way that I have come across is using groupby and finding minimal index for the name in the group:



          for _, dfg in df.groupby(df['Name']):
          gidx = min(dfg.index)
          df.loc[df.index == gidx,'Status'] = 'Starter'

          print(df)


          And the df is then:



             Name  Payroll  Week   Status
          0 John 15 1 Starter
          1 John 15 2 Leaver
          2 Lucy 75 1 Starter
          3 Lucy 75 2 Active
          4 Lucy 75 3 Active





          share|improve this answer














          The simplest way that I have come across is using groupby and finding minimal index for the name in the group:



          for _, dfg in df.groupby(df['Name']):
          gidx = min(dfg.index)
          df.loc[df.index == gidx,'Status'] = 'Starter'

          print(df)


          And the df is then:



             Name  Payroll  Week   Status
          0 John 15 1 Starter
          1 John 15 2 Leaver
          2 Lucy 75 1 Starter
          3 Lucy 75 2 Active
          4 Lucy 75 3 Active






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited yesterday

























          answered yesterday









          sophros

          1,6831418




          1,6831418












          • awesome! let me test this, currently running on my lovely work laptop!
            – Datanovice
            yesterday










          • Mind the correction (proper indexing this time) :)
            – sophros
            yesterday










          • additionally, could this be used to find the last occurrence of an item in a df? I'd like to note someone as active until they are a leaver in their max week if its less than the dataframes max week
            – Datanovice
            yesterday










          • and final comment, anyway to do this without a for loop, I've always been reprimanded here for using them as pandas uses 2d vector arrays and doesn't work well with loops (especially as my df is 2million rows +)
            – Datanovice
            yesterday






          • 1




            You can change the min to max and you have the last entry. With this the changes are minimal. In terms of the for loop there is another approach I will post in a moment.
            – sophros
            yesterday


















          • awesome! let me test this, currently running on my lovely work laptop!
            – Datanovice
            yesterday










          • Mind the correction (proper indexing this time) :)
            – sophros
            yesterday










          • additionally, could this be used to find the last occurrence of an item in a df? I'd like to note someone as active until they are a leaver in their max week if its less than the dataframes max week
            – Datanovice
            yesterday










          • and final comment, anyway to do this without a for loop, I've always been reprimanded here for using them as pandas uses 2d vector arrays and doesn't work well with loops (especially as my df is 2million rows +)
            – Datanovice
            yesterday






          • 1




            You can change the min to max and you have the last entry. With this the changes are minimal. In terms of the for loop there is another approach I will post in a moment.
            – sophros
            yesterday
















          awesome! let me test this, currently running on my lovely work laptop!
          – Datanovice
          yesterday




          awesome! let me test this, currently running on my lovely work laptop!
          – Datanovice
          yesterday












          Mind the correction (proper indexing this time) :)
          – sophros
          yesterday




          Mind the correction (proper indexing this time) :)
          – sophros
          yesterday












          additionally, could this be used to find the last occurrence of an item in a df? I'd like to note someone as active until they are a leaver in their max week if its less than the dataframes max week
          – Datanovice
          yesterday




          additionally, could this be used to find the last occurrence of an item in a df? I'd like to note someone as active until they are a leaver in their max week if its less than the dataframes max week
          – Datanovice
          yesterday












          and final comment, anyway to do this without a for loop, I've always been reprimanded here for using them as pandas uses 2d vector arrays and doesn't work well with loops (especially as my df is 2million rows +)
          – Datanovice
          yesterday




          and final comment, anyway to do this without a for loop, I've always been reprimanded here for using them as pandas uses 2d vector arrays and doesn't work well with loops (especially as my df is 2million rows +)
          – Datanovice
          yesterday




          1




          1




          You can change the min to max and you have the last entry. With this the changes are minimal. In terms of the for loop there is another approach I will post in a moment.
          – sophros
          yesterday




          You can change the min to max and you have the last entry. With this the changes are minimal. In terms of the for loop there is another approach I will post in a moment.
          – sophros
          yesterday


















           

          draft saved


          draft discarded



















































           


          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53372676%2fpandas-to-find-earliest-occurrence-of-statement-and-set-to-starter%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Can a sorcerer learn a 5th-level spell early by creating spell slots using the Font of Magic feature?

          ts Property 'filter' does not exist on type '{}'

          Notepad++ export/extract a list of installed plugins