Using Stats to provide a score for a daily rating with low sample size











up vote
0
down vote

favorite












Context



We have a register of people in our database, with a record of teams and leaders. Every day we send out a survey to each team asking them to rate their leader on whether they performed a specific duty. So for example we might have a team leader who has 5 team members reporting into that leader. Each day those 5 members receive a survey asking them if the leader performed a duty.



The reports may choose whether or not to respond to the survey, and so for any one day you may receive responses from all 5 reports, or from only 1 report, or from 0 reports.



The result from each report gives you a binary decision, 1 or 0, of whether the leader performed the duty on that day.



Moving beyond this example, each leader will have a different team size.



In addition, in each survey we can actually ask the team member whether the leader performed more than one duty. So we could ask the report about 1 duty, or 2 duties or 3 duties, or even more. Again, for each duty we have a true or false, 1 or 0, decision regarding whether the leader performed that duty, with a set of decisions for each report who answered the survey.



Problem



We want to be able to give a rating to a leader on how they performed for any one survey. Each leader has a team size, and for each survey only a certain proportion of that team size will actually respond. The response rate could be anywhere from 0% to 100%.



Ignoring the case for a 0% participation rate, how we do we give the leader a score for their performance for the day?



We would ideally like to use only the variables that are available for that day – so not using any historical data from past surveys, but instead only the current survey’s results.



We would ideally like to consider participation rate in the scoring somehow as well, so that the results from a day with high confidence (full participation rate) are not somehow at a disadvantage to a day with low confidence (low participation rate).



The score itself could take the form of a 0 to 1 or 0 to 100 scale, with a threshold applied to transform the score into a specific category e.g. “Good”, “Neutral”, “Bad”, or it could go directly to a categorical score.



In addition, the scores that people receive should be perceived as fair, as they will have an effect on motivation. We want the scores to be independent of other team leader's scores – meaning that we do not want to obtain some pre-specified distribution like grading on a curve (https://en.wikipedia.org/wiki/Grading_on_a_curve).



Options I've considered



There are some really obvious ones like:




  • Number of Votes / Team Size (for each duty, and then average across all duties), then apply a threshold to the percentage - [possibly our best option right now, but participation is usually toward the lower end, meaning this would make it hard to score well without getting participation higher]

  • Number of Votes / Number of Survey Participants (for each duty, and then average across all duties), then apply a threshold to the percentage - [this makes it easier to get 100 if less people participate in the survey]

  • Number of Votes by itself with a threshold on the total count - [this disadvantages small teams, as they can get less possible votes]


Neither of which take into account all three variables at the same time. I have considered whether confidence intervals would be appropriate, but our team sizes vary from 1 person to about 30 - so I think the population size is too small to use those (correct me if I'm wrong). So Evan Miller's how not to sort by average rating I believe doesn't apply well in my case. In addition, these aren't ratings, but rather occurrences of particular events, which are independent of each other.



What am I missing here that would be a perfect fit?










share|cite|improve this question







New contributor




tastychocolatemilk is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
























    up vote
    0
    down vote

    favorite












    Context



    We have a register of people in our database, with a record of teams and leaders. Every day we send out a survey to each team asking them to rate their leader on whether they performed a specific duty. So for example we might have a team leader who has 5 team members reporting into that leader. Each day those 5 members receive a survey asking them if the leader performed a duty.



    The reports may choose whether or not to respond to the survey, and so for any one day you may receive responses from all 5 reports, or from only 1 report, or from 0 reports.



    The result from each report gives you a binary decision, 1 or 0, of whether the leader performed the duty on that day.



    Moving beyond this example, each leader will have a different team size.



    In addition, in each survey we can actually ask the team member whether the leader performed more than one duty. So we could ask the report about 1 duty, or 2 duties or 3 duties, or even more. Again, for each duty we have a true or false, 1 or 0, decision regarding whether the leader performed that duty, with a set of decisions for each report who answered the survey.



    Problem



    We want to be able to give a rating to a leader on how they performed for any one survey. Each leader has a team size, and for each survey only a certain proportion of that team size will actually respond. The response rate could be anywhere from 0% to 100%.



    Ignoring the case for a 0% participation rate, how we do we give the leader a score for their performance for the day?



    We would ideally like to use only the variables that are available for that day – so not using any historical data from past surveys, but instead only the current survey’s results.



    We would ideally like to consider participation rate in the scoring somehow as well, so that the results from a day with high confidence (full participation rate) are not somehow at a disadvantage to a day with low confidence (low participation rate).



    The score itself could take the form of a 0 to 1 or 0 to 100 scale, with a threshold applied to transform the score into a specific category e.g. “Good”, “Neutral”, “Bad”, or it could go directly to a categorical score.



    In addition, the scores that people receive should be perceived as fair, as they will have an effect on motivation. We want the scores to be independent of other team leader's scores – meaning that we do not want to obtain some pre-specified distribution like grading on a curve (https://en.wikipedia.org/wiki/Grading_on_a_curve).



    Options I've considered



    There are some really obvious ones like:




    • Number of Votes / Team Size (for each duty, and then average across all duties), then apply a threshold to the percentage - [possibly our best option right now, but participation is usually toward the lower end, meaning this would make it hard to score well without getting participation higher]

    • Number of Votes / Number of Survey Participants (for each duty, and then average across all duties), then apply a threshold to the percentage - [this makes it easier to get 100 if less people participate in the survey]

    • Number of Votes by itself with a threshold on the total count - [this disadvantages small teams, as they can get less possible votes]


    Neither of which take into account all three variables at the same time. I have considered whether confidence intervals would be appropriate, but our team sizes vary from 1 person to about 30 - so I think the population size is too small to use those (correct me if I'm wrong). So Evan Miller's how not to sort by average rating I believe doesn't apply well in my case. In addition, these aren't ratings, but rather occurrences of particular events, which are independent of each other.



    What am I missing here that would be a perfect fit?










    share|cite|improve this question







    New contributor




    tastychocolatemilk is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.






















      up vote
      0
      down vote

      favorite









      up vote
      0
      down vote

      favorite











      Context



      We have a register of people in our database, with a record of teams and leaders. Every day we send out a survey to each team asking them to rate their leader on whether they performed a specific duty. So for example we might have a team leader who has 5 team members reporting into that leader. Each day those 5 members receive a survey asking them if the leader performed a duty.



      The reports may choose whether or not to respond to the survey, and so for any one day you may receive responses from all 5 reports, or from only 1 report, or from 0 reports.



      The result from each report gives you a binary decision, 1 or 0, of whether the leader performed the duty on that day.



      Moving beyond this example, each leader will have a different team size.



      In addition, in each survey we can actually ask the team member whether the leader performed more than one duty. So we could ask the report about 1 duty, or 2 duties or 3 duties, or even more. Again, for each duty we have a true or false, 1 or 0, decision regarding whether the leader performed that duty, with a set of decisions for each report who answered the survey.



      Problem



      We want to be able to give a rating to a leader on how they performed for any one survey. Each leader has a team size, and for each survey only a certain proportion of that team size will actually respond. The response rate could be anywhere from 0% to 100%.



      Ignoring the case for a 0% participation rate, how we do we give the leader a score for their performance for the day?



      We would ideally like to use only the variables that are available for that day – so not using any historical data from past surveys, but instead only the current survey’s results.



      We would ideally like to consider participation rate in the scoring somehow as well, so that the results from a day with high confidence (full participation rate) are not somehow at a disadvantage to a day with low confidence (low participation rate).



      The score itself could take the form of a 0 to 1 or 0 to 100 scale, with a threshold applied to transform the score into a specific category e.g. “Good”, “Neutral”, “Bad”, or it could go directly to a categorical score.



      In addition, the scores that people receive should be perceived as fair, as they will have an effect on motivation. We want the scores to be independent of other team leader's scores – meaning that we do not want to obtain some pre-specified distribution like grading on a curve (https://en.wikipedia.org/wiki/Grading_on_a_curve).



      Options I've considered



      There are some really obvious ones like:




      • Number of Votes / Team Size (for each duty, and then average across all duties), then apply a threshold to the percentage - [possibly our best option right now, but participation is usually toward the lower end, meaning this would make it hard to score well without getting participation higher]

      • Number of Votes / Number of Survey Participants (for each duty, and then average across all duties), then apply a threshold to the percentage - [this makes it easier to get 100 if less people participate in the survey]

      • Number of Votes by itself with a threshold on the total count - [this disadvantages small teams, as they can get less possible votes]


      Neither of which take into account all three variables at the same time. I have considered whether confidence intervals would be appropriate, but our team sizes vary from 1 person to about 30 - so I think the population size is too small to use those (correct me if I'm wrong). So Evan Miller's how not to sort by average rating I believe doesn't apply well in my case. In addition, these aren't ratings, but rather occurrences of particular events, which are independent of each other.



      What am I missing here that would be a perfect fit?










      share|cite|improve this question







      New contributor




      tastychocolatemilk is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      Context



      We have a register of people in our database, with a record of teams and leaders. Every day we send out a survey to each team asking them to rate their leader on whether they performed a specific duty. So for example we might have a team leader who has 5 team members reporting into that leader. Each day those 5 members receive a survey asking them if the leader performed a duty.



      The reports may choose whether or not to respond to the survey, and so for any one day you may receive responses from all 5 reports, or from only 1 report, or from 0 reports.



      The result from each report gives you a binary decision, 1 or 0, of whether the leader performed the duty on that day.



      Moving beyond this example, each leader will have a different team size.



      In addition, in each survey we can actually ask the team member whether the leader performed more than one duty. So we could ask the report about 1 duty, or 2 duties or 3 duties, or even more. Again, for each duty we have a true or false, 1 or 0, decision regarding whether the leader performed that duty, with a set of decisions for each report who answered the survey.



      Problem



      We want to be able to give a rating to a leader on how they performed for any one survey. Each leader has a team size, and for each survey only a certain proportion of that team size will actually respond. The response rate could be anywhere from 0% to 100%.



      Ignoring the case for a 0% participation rate, how we do we give the leader a score for their performance for the day?



      We would ideally like to use only the variables that are available for that day – so not using any historical data from past surveys, but instead only the current survey’s results.



      We would ideally like to consider participation rate in the scoring somehow as well, so that the results from a day with high confidence (full participation rate) are not somehow at a disadvantage to a day with low confidence (low participation rate).



      The score itself could take the form of a 0 to 1 or 0 to 100 scale, with a threshold applied to transform the score into a specific category e.g. “Good”, “Neutral”, “Bad”, or it could go directly to a categorical score.



      In addition, the scores that people receive should be perceived as fair, as they will have an effect on motivation. We want the scores to be independent of other team leader's scores – meaning that we do not want to obtain some pre-specified distribution like grading on a curve (https://en.wikipedia.org/wiki/Grading_on_a_curve).



      Options I've considered



      There are some really obvious ones like:




      • Number of Votes / Team Size (for each duty, and then average across all duties), then apply a threshold to the percentage - [possibly our best option right now, but participation is usually toward the lower end, meaning this would make it hard to score well without getting participation higher]

      • Number of Votes / Number of Survey Participants (for each duty, and then average across all duties), then apply a threshold to the percentage - [this makes it easier to get 100 if less people participate in the survey]

      • Number of Votes by itself with a threshold on the total count - [this disadvantages small teams, as they can get less possible votes]


      Neither of which take into account all three variables at the same time. I have considered whether confidence intervals would be appropriate, but our team sizes vary from 1 person to about 30 - so I think the population size is too small to use those (correct me if I'm wrong). So Evan Miller's how not to sort by average rating I believe doesn't apply well in my case. In addition, these aren't ratings, but rather occurrences of particular events, which are independent of each other.



      What am I missing here that would be a perfect fit?







      probability analysis scoring-algorithm






      share|cite|improve this question







      New contributor




      tastychocolatemilk is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|cite|improve this question







      New contributor




      tastychocolatemilk is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|cite|improve this question




      share|cite|improve this question






      New contributor




      tastychocolatemilk is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked 15 hours ago









      tastychocolatemilk

      1




      1




      New contributor




      tastychocolatemilk is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      tastychocolatemilk is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      tastychocolatemilk is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.



























          active

          oldest

          votes











          Your Answer





          StackExchange.ifUsing("editor", function () {
          return StackExchange.using("mathjaxEditing", function () {
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
          });
          });
          }, "mathjax-editing");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "69"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          noCode: true, onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });






          tastychocolatemilk is a new contributor. Be nice, and check out our Code of Conduct.










           

          draft saved


          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3004729%2fusing-stats-to-provide-a-score-for-a-daily-rating-with-low-sample-size%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown






























          active

          oldest

          votes













          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          tastychocolatemilk is a new contributor. Be nice, and check out our Code of Conduct.










           

          draft saved


          draft discarded


















          tastychocolatemilk is a new contributor. Be nice, and check out our Code of Conduct.













          tastychocolatemilk is a new contributor. Be nice, and check out our Code of Conduct.












          tastychocolatemilk is a new contributor. Be nice, and check out our Code of Conduct.















           


          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3004729%2fusing-stats-to-provide-a-score-for-a-daily-rating-with-low-sample-size%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Can a sorcerer learn a 5th-level spell early by creating spell slots using the Font of Magic feature?

          Does disintegrating a polymorphed enemy still kill it after the 2018 errata?

          A Topological Invariant for $pi_3(U(n))$