Choose dictionary keys only if their values don't have a certain number of duplicates












-1















Given a dictionary and a limit for the number of keys in a new dictionary, I would like the new dictionary to contain the keys with the highest values.



The given dict is:



dict = {'apple':5, 'pears':4, 'orange':3, 'kiwi':3, 'banana':1 }


I want to get a new dictionary that has the keys with the highest values of length limit.



For instance for limit=1 the new dict is



{'apple':5} 


if the limit=2



{'apple':5, 'pears':4}


I tried this:



return dict(sorted(dictation.items(),key=lambda x: -x[1])[:limit])


but when I try limit=3, I get



{'apple':5, 'pears':4, 'orange':3}


But it shouldn't include orange:3 because orange and kiwi have same priority if we include kiwi and orange it will exceed the limit so it shouldn't include both. I should return



{'apple':5, 'pears':4}









share|improve this question




















  • 1





    I followed this question until you said "because I can't add orange. If I add it will be more than the limit.".

    – timgeb
    Nov 21 '18 at 18:47






  • 1





    So you're saying if there are multiple items with the same count, you should only take any if all of them fit within the limit? Have you looked into whether Counter.most_common does what you need? I'd recommend not trying to fit it into one line.

    – jonrsharpe
    Nov 21 '18 at 18:49
















-1















Given a dictionary and a limit for the number of keys in a new dictionary, I would like the new dictionary to contain the keys with the highest values.



The given dict is:



dict = {'apple':5, 'pears':4, 'orange':3, 'kiwi':3, 'banana':1 }


I want to get a new dictionary that has the keys with the highest values of length limit.



For instance for limit=1 the new dict is



{'apple':5} 


if the limit=2



{'apple':5, 'pears':4}


I tried this:



return dict(sorted(dictation.items(),key=lambda x: -x[1])[:limit])


but when I try limit=3, I get



{'apple':5, 'pears':4, 'orange':3}


But it shouldn't include orange:3 because orange and kiwi have same priority if we include kiwi and orange it will exceed the limit so it shouldn't include both. I should return



{'apple':5, 'pears':4}









share|improve this question




















  • 1





    I followed this question until you said "because I can't add orange. If I add it will be more than the limit.".

    – timgeb
    Nov 21 '18 at 18:47






  • 1





    So you're saying if there are multiple items with the same count, you should only take any if all of them fit within the limit? Have you looked into whether Counter.most_common does what you need? I'd recommend not trying to fit it into one line.

    – jonrsharpe
    Nov 21 '18 at 18:49














-1












-1








-1








Given a dictionary and a limit for the number of keys in a new dictionary, I would like the new dictionary to contain the keys with the highest values.



The given dict is:



dict = {'apple':5, 'pears':4, 'orange':3, 'kiwi':3, 'banana':1 }


I want to get a new dictionary that has the keys with the highest values of length limit.



For instance for limit=1 the new dict is



{'apple':5} 


if the limit=2



{'apple':5, 'pears':4}


I tried this:



return dict(sorted(dictation.items(),key=lambda x: -x[1])[:limit])


but when I try limit=3, I get



{'apple':5, 'pears':4, 'orange':3}


But it shouldn't include orange:3 because orange and kiwi have same priority if we include kiwi and orange it will exceed the limit so it shouldn't include both. I should return



{'apple':5, 'pears':4}









share|improve this question
















Given a dictionary and a limit for the number of keys in a new dictionary, I would like the new dictionary to contain the keys with the highest values.



The given dict is:



dict = {'apple':5, 'pears':4, 'orange':3, 'kiwi':3, 'banana':1 }


I want to get a new dictionary that has the keys with the highest values of length limit.



For instance for limit=1 the new dict is



{'apple':5} 


if the limit=2



{'apple':5, 'pears':4}


I tried this:



return dict(sorted(dictation.items(),key=lambda x: -x[1])[:limit])


but when I try limit=3, I get



{'apple':5, 'pears':4, 'orange':3}


But it shouldn't include orange:3 because orange and kiwi have same priority if we include kiwi and orange it will exceed the limit so it shouldn't include both. I should return



{'apple':5, 'pears':4}






python dictionary






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 21 '18 at 18:55









Conner

23.4k84568




23.4k84568










asked Nov 21 '18 at 18:43









CompComp

456




456








  • 1





    I followed this question until you said "because I can't add orange. If I add it will be more than the limit.".

    – timgeb
    Nov 21 '18 at 18:47






  • 1





    So you're saying if there are multiple items with the same count, you should only take any if all of them fit within the limit? Have you looked into whether Counter.most_common does what you need? I'd recommend not trying to fit it into one line.

    – jonrsharpe
    Nov 21 '18 at 18:49














  • 1





    I followed this question until you said "because I can't add orange. If I add it will be more than the limit.".

    – timgeb
    Nov 21 '18 at 18:47






  • 1





    So you're saying if there are multiple items with the same count, you should only take any if all of them fit within the limit? Have you looked into whether Counter.most_common does what you need? I'd recommend not trying to fit it into one line.

    – jonrsharpe
    Nov 21 '18 at 18:49








1




1





I followed this question until you said "because I can't add orange. If I add it will be more than the limit.".

– timgeb
Nov 21 '18 at 18:47





I followed this question until you said "because I can't add orange. If I add it will be more than the limit.".

– timgeb
Nov 21 '18 at 18:47




1




1





So you're saying if there are multiple items with the same count, you should only take any if all of them fit within the limit? Have you looked into whether Counter.most_common does what you need? I'd recommend not trying to fit it into one line.

– jonrsharpe
Nov 21 '18 at 18:49





So you're saying if there are multiple items with the same count, you should only take any if all of them fit within the limit? Have you looked into whether Counter.most_common does what you need? I'd recommend not trying to fit it into one line.

– jonrsharpe
Nov 21 '18 at 18:49












3 Answers
3






active

oldest

votes


















2














The way to go would be to use a collections.Counter and most_common(n). Then you can take one more as needed and keep popping until the last value changes:



from collections import Counter

dct = {'apple':5, 'pears':4, 'orange':3, 'kiwi':3, 'banana':1}
n = 3

items = Counter(dictation).most_common(n+1)
last_val = items[-1][1]
if len(items) > n:
while items[-1][1] == last_val:
items.pop()

new = dict(items)
# {'apple': 5, 'pears': 4}





share|improve this answer


























  • But OP says for limit=3 the dict should still have 2 keys, for reasons I don't unterstand.

    – timgeb
    Nov 21 '18 at 18:50











  • @timgeb I added the necessary bumpiness. Lost all of its appeal :(

    – schwobaseggl
    Nov 21 '18 at 18:57













  • still shorter then mine

    – Patrick Artner
    Nov 21 '18 at 19:03



















1














This is computationally not very good, but it works. It creates a Counter object to get the sorted output for your data and a inverted defaultdict that holds list that match to a score - it creates the result using both and some math:



from collections import defaultdict, Counter

def gimme(d,n):
c = Counter(d)
grpd = defaultdict(list)
for key,value in c.items():
grpd[value].append(key)


result = {}
for key,value in c.most_common():
if len(grpd[value])+len(result) <= n:
result.update( {k:value for k in grpd[value] } )
else:
break
return result


Test:



data = {'apple':5, 'pears':4, 'orange':3, 'kiwi':3, 'banana':1 }

for k in range(10):
print(k, gimme(data,k))


Output:



0 {}
1 {'apple': 5}
2 {'apple': 5, 'pears': 4}
3 {'apple': 5, 'pears': 4}
4 {'apple': 5, 'pears': 4, 'orange': 3, 'kiwi': 3}
5 {'apple': 5, 'pears': 4, 'orange': 3, 'kiwi': 3}
6 {'apple': 5, 'pears': 4, 'orange': 3, 'kiwi': 3, 'banana': 1}
7 {'apple': 5, 'pears': 4, 'orange': 3, 'kiwi': 3, 'banana': 1}
8 {'apple': 5, 'pears': 4, 'orange': 3, 'kiwi': 3, 'banana': 1}
9 {'apple': 5, 'pears': 4, 'orange': 3, 'kiwi': 3, 'banana': 1}





share|improve this answer































    1














    As you note, filtering by the top n doesn't exclude by default all equal values which exceed the stated cap. This is by design.



    The trick is to consider the (n+1) th highest value and ensure the values in your dictionary are all higher than this number:



    from heapq import nlargest

    dictation = {'apple':5, 'pears':4, 'orange':3, 'kiwi':3, 'banana':1}

    n = 3
    largest_items = nlargest(n+1, dictation.items(), key=lambda x: x[1])
    n_plus_one_value = largest_items[-1][1]

    res = {k: v for k, v in largest_items if v > n_plus_one_value}

    print(res)

    {'apple': 5, 'pears': 4}


    We assume here len(largest_items) < n, otherwise you can just take the input dictionary as the result.





    The dictionary comprehension seems expensive. For larger inputs, you can use bisect, something like:



    from heapq import nlargest
    from operator import itemgetter
    from bisect import bisect

    dictation = {'apple':5, 'pears':4, 'orange':3, 'kiwi':3, 'banana':1}

    n = 3
    largest_items = nlargest(n+1, dictation.items(), key=lambda x: x[1])
    n_plus_one_value = largest_items[-1][1]

    index = bisect(list(map(itemgetter(1), largest_items))[::-1], n_plus_one_value)

    res = dict(largest_items[:len(largest_items) - index])

    print(res)

    {'apple': 5, 'pears': 4}





    share|improve this answer

























      Your Answer






      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "1"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53418662%2fchoose-dictionary-keys-only-if-their-values-dont-have-a-certain-number-of-dupli%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      3 Answers
      3






      active

      oldest

      votes








      3 Answers
      3






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      2














      The way to go would be to use a collections.Counter and most_common(n). Then you can take one more as needed and keep popping until the last value changes:



      from collections import Counter

      dct = {'apple':5, 'pears':4, 'orange':3, 'kiwi':3, 'banana':1}
      n = 3

      items = Counter(dictation).most_common(n+1)
      last_val = items[-1][1]
      if len(items) > n:
      while items[-1][1] == last_val:
      items.pop()

      new = dict(items)
      # {'apple': 5, 'pears': 4}





      share|improve this answer


























      • But OP says for limit=3 the dict should still have 2 keys, for reasons I don't unterstand.

        – timgeb
        Nov 21 '18 at 18:50











      • @timgeb I added the necessary bumpiness. Lost all of its appeal :(

        – schwobaseggl
        Nov 21 '18 at 18:57













      • still shorter then mine

        – Patrick Artner
        Nov 21 '18 at 19:03
















      2














      The way to go would be to use a collections.Counter and most_common(n). Then you can take one more as needed and keep popping until the last value changes:



      from collections import Counter

      dct = {'apple':5, 'pears':4, 'orange':3, 'kiwi':3, 'banana':1}
      n = 3

      items = Counter(dictation).most_common(n+1)
      last_val = items[-1][1]
      if len(items) > n:
      while items[-1][1] == last_val:
      items.pop()

      new = dict(items)
      # {'apple': 5, 'pears': 4}





      share|improve this answer


























      • But OP says for limit=3 the dict should still have 2 keys, for reasons I don't unterstand.

        – timgeb
        Nov 21 '18 at 18:50











      • @timgeb I added the necessary bumpiness. Lost all of its appeal :(

        – schwobaseggl
        Nov 21 '18 at 18:57













      • still shorter then mine

        – Patrick Artner
        Nov 21 '18 at 19:03














      2












      2








      2







      The way to go would be to use a collections.Counter and most_common(n). Then you can take one more as needed and keep popping until the last value changes:



      from collections import Counter

      dct = {'apple':5, 'pears':4, 'orange':3, 'kiwi':3, 'banana':1}
      n = 3

      items = Counter(dictation).most_common(n+1)
      last_val = items[-1][1]
      if len(items) > n:
      while items[-1][1] == last_val:
      items.pop()

      new = dict(items)
      # {'apple': 5, 'pears': 4}





      share|improve this answer















      The way to go would be to use a collections.Counter and most_common(n). Then you can take one more as needed and keep popping until the last value changes:



      from collections import Counter

      dct = {'apple':5, 'pears':4, 'orange':3, 'kiwi':3, 'banana':1}
      n = 3

      items = Counter(dictation).most_common(n+1)
      last_val = items[-1][1]
      if len(items) > n:
      while items[-1][1] == last_val:
      items.pop()

      new = dict(items)
      # {'apple': 5, 'pears': 4}






      share|improve this answer














      share|improve this answer



      share|improve this answer








      edited Nov 21 '18 at 18:54

























      answered Nov 21 '18 at 18:49









      schwobasegglschwobaseggl

      37.2k32442




      37.2k32442













      • But OP says for limit=3 the dict should still have 2 keys, for reasons I don't unterstand.

        – timgeb
        Nov 21 '18 at 18:50











      • @timgeb I added the necessary bumpiness. Lost all of its appeal :(

        – schwobaseggl
        Nov 21 '18 at 18:57













      • still shorter then mine

        – Patrick Artner
        Nov 21 '18 at 19:03



















      • But OP says for limit=3 the dict should still have 2 keys, for reasons I don't unterstand.

        – timgeb
        Nov 21 '18 at 18:50











      • @timgeb I added the necessary bumpiness. Lost all of its appeal :(

        – schwobaseggl
        Nov 21 '18 at 18:57













      • still shorter then mine

        – Patrick Artner
        Nov 21 '18 at 19:03

















      But OP says for limit=3 the dict should still have 2 keys, for reasons I don't unterstand.

      – timgeb
      Nov 21 '18 at 18:50





      But OP says for limit=3 the dict should still have 2 keys, for reasons I don't unterstand.

      – timgeb
      Nov 21 '18 at 18:50













      @timgeb I added the necessary bumpiness. Lost all of its appeal :(

      – schwobaseggl
      Nov 21 '18 at 18:57







      @timgeb I added the necessary bumpiness. Lost all of its appeal :(

      – schwobaseggl
      Nov 21 '18 at 18:57















      still shorter then mine

      – Patrick Artner
      Nov 21 '18 at 19:03





      still shorter then mine

      – Patrick Artner
      Nov 21 '18 at 19:03













      1














      This is computationally not very good, but it works. It creates a Counter object to get the sorted output for your data and a inverted defaultdict that holds list that match to a score - it creates the result using both and some math:



      from collections import defaultdict, Counter

      def gimme(d,n):
      c = Counter(d)
      grpd = defaultdict(list)
      for key,value in c.items():
      grpd[value].append(key)


      result = {}
      for key,value in c.most_common():
      if len(grpd[value])+len(result) <= n:
      result.update( {k:value for k in grpd[value] } )
      else:
      break
      return result


      Test:



      data = {'apple':5, 'pears':4, 'orange':3, 'kiwi':3, 'banana':1 }

      for k in range(10):
      print(k, gimme(data,k))


      Output:



      0 {}
      1 {'apple': 5}
      2 {'apple': 5, 'pears': 4}
      3 {'apple': 5, 'pears': 4}
      4 {'apple': 5, 'pears': 4, 'orange': 3, 'kiwi': 3}
      5 {'apple': 5, 'pears': 4, 'orange': 3, 'kiwi': 3}
      6 {'apple': 5, 'pears': 4, 'orange': 3, 'kiwi': 3, 'banana': 1}
      7 {'apple': 5, 'pears': 4, 'orange': 3, 'kiwi': 3, 'banana': 1}
      8 {'apple': 5, 'pears': 4, 'orange': 3, 'kiwi': 3, 'banana': 1}
      9 {'apple': 5, 'pears': 4, 'orange': 3, 'kiwi': 3, 'banana': 1}





      share|improve this answer




























        1














        This is computationally not very good, but it works. It creates a Counter object to get the sorted output for your data and a inverted defaultdict that holds list that match to a score - it creates the result using both and some math:



        from collections import defaultdict, Counter

        def gimme(d,n):
        c = Counter(d)
        grpd = defaultdict(list)
        for key,value in c.items():
        grpd[value].append(key)


        result = {}
        for key,value in c.most_common():
        if len(grpd[value])+len(result) <= n:
        result.update( {k:value for k in grpd[value] } )
        else:
        break
        return result


        Test:



        data = {'apple':5, 'pears':4, 'orange':3, 'kiwi':3, 'banana':1 }

        for k in range(10):
        print(k, gimme(data,k))


        Output:



        0 {}
        1 {'apple': 5}
        2 {'apple': 5, 'pears': 4}
        3 {'apple': 5, 'pears': 4}
        4 {'apple': 5, 'pears': 4, 'orange': 3, 'kiwi': 3}
        5 {'apple': 5, 'pears': 4, 'orange': 3, 'kiwi': 3}
        6 {'apple': 5, 'pears': 4, 'orange': 3, 'kiwi': 3, 'banana': 1}
        7 {'apple': 5, 'pears': 4, 'orange': 3, 'kiwi': 3, 'banana': 1}
        8 {'apple': 5, 'pears': 4, 'orange': 3, 'kiwi': 3, 'banana': 1}
        9 {'apple': 5, 'pears': 4, 'orange': 3, 'kiwi': 3, 'banana': 1}





        share|improve this answer


























          1












          1








          1







          This is computationally not very good, but it works. It creates a Counter object to get the sorted output for your data and a inverted defaultdict that holds list that match to a score - it creates the result using both and some math:



          from collections import defaultdict, Counter

          def gimme(d,n):
          c = Counter(d)
          grpd = defaultdict(list)
          for key,value in c.items():
          grpd[value].append(key)


          result = {}
          for key,value in c.most_common():
          if len(grpd[value])+len(result) <= n:
          result.update( {k:value for k in grpd[value] } )
          else:
          break
          return result


          Test:



          data = {'apple':5, 'pears':4, 'orange':3, 'kiwi':3, 'banana':1 }

          for k in range(10):
          print(k, gimme(data,k))


          Output:



          0 {}
          1 {'apple': 5}
          2 {'apple': 5, 'pears': 4}
          3 {'apple': 5, 'pears': 4}
          4 {'apple': 5, 'pears': 4, 'orange': 3, 'kiwi': 3}
          5 {'apple': 5, 'pears': 4, 'orange': 3, 'kiwi': 3}
          6 {'apple': 5, 'pears': 4, 'orange': 3, 'kiwi': 3, 'banana': 1}
          7 {'apple': 5, 'pears': 4, 'orange': 3, 'kiwi': 3, 'banana': 1}
          8 {'apple': 5, 'pears': 4, 'orange': 3, 'kiwi': 3, 'banana': 1}
          9 {'apple': 5, 'pears': 4, 'orange': 3, 'kiwi': 3, 'banana': 1}





          share|improve this answer













          This is computationally not very good, but it works. It creates a Counter object to get the sorted output for your data and a inverted defaultdict that holds list that match to a score - it creates the result using both and some math:



          from collections import defaultdict, Counter

          def gimme(d,n):
          c = Counter(d)
          grpd = defaultdict(list)
          for key,value in c.items():
          grpd[value].append(key)


          result = {}
          for key,value in c.most_common():
          if len(grpd[value])+len(result) <= n:
          result.update( {k:value for k in grpd[value] } )
          else:
          break
          return result


          Test:



          data = {'apple':5, 'pears':4, 'orange':3, 'kiwi':3, 'banana':1 }

          for k in range(10):
          print(k, gimme(data,k))


          Output:



          0 {}
          1 {'apple': 5}
          2 {'apple': 5, 'pears': 4}
          3 {'apple': 5, 'pears': 4}
          4 {'apple': 5, 'pears': 4, 'orange': 3, 'kiwi': 3}
          5 {'apple': 5, 'pears': 4, 'orange': 3, 'kiwi': 3}
          6 {'apple': 5, 'pears': 4, 'orange': 3, 'kiwi': 3, 'banana': 1}
          7 {'apple': 5, 'pears': 4, 'orange': 3, 'kiwi': 3, 'banana': 1}
          8 {'apple': 5, 'pears': 4, 'orange': 3, 'kiwi': 3, 'banana': 1}
          9 {'apple': 5, 'pears': 4, 'orange': 3, 'kiwi': 3, 'banana': 1}






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 21 '18 at 19:01









          Patrick ArtnerPatrick Artner

          24.2k62443




          24.2k62443























              1














              As you note, filtering by the top n doesn't exclude by default all equal values which exceed the stated cap. This is by design.



              The trick is to consider the (n+1) th highest value and ensure the values in your dictionary are all higher than this number:



              from heapq import nlargest

              dictation = {'apple':5, 'pears':4, 'orange':3, 'kiwi':3, 'banana':1}

              n = 3
              largest_items = nlargest(n+1, dictation.items(), key=lambda x: x[1])
              n_plus_one_value = largest_items[-1][1]

              res = {k: v for k, v in largest_items if v > n_plus_one_value}

              print(res)

              {'apple': 5, 'pears': 4}


              We assume here len(largest_items) < n, otherwise you can just take the input dictionary as the result.





              The dictionary comprehension seems expensive. For larger inputs, you can use bisect, something like:



              from heapq import nlargest
              from operator import itemgetter
              from bisect import bisect

              dictation = {'apple':5, 'pears':4, 'orange':3, 'kiwi':3, 'banana':1}

              n = 3
              largest_items = nlargest(n+1, dictation.items(), key=lambda x: x[1])
              n_plus_one_value = largest_items[-1][1]

              index = bisect(list(map(itemgetter(1), largest_items))[::-1], n_plus_one_value)

              res = dict(largest_items[:len(largest_items) - index])

              print(res)

              {'apple': 5, 'pears': 4}





              share|improve this answer






























                1














                As you note, filtering by the top n doesn't exclude by default all equal values which exceed the stated cap. This is by design.



                The trick is to consider the (n+1) th highest value and ensure the values in your dictionary are all higher than this number:



                from heapq import nlargest

                dictation = {'apple':5, 'pears':4, 'orange':3, 'kiwi':3, 'banana':1}

                n = 3
                largest_items = nlargest(n+1, dictation.items(), key=lambda x: x[1])
                n_plus_one_value = largest_items[-1][1]

                res = {k: v for k, v in largest_items if v > n_plus_one_value}

                print(res)

                {'apple': 5, 'pears': 4}


                We assume here len(largest_items) < n, otherwise you can just take the input dictionary as the result.





                The dictionary comprehension seems expensive. For larger inputs, you can use bisect, something like:



                from heapq import nlargest
                from operator import itemgetter
                from bisect import bisect

                dictation = {'apple':5, 'pears':4, 'orange':3, 'kiwi':3, 'banana':1}

                n = 3
                largest_items = nlargest(n+1, dictation.items(), key=lambda x: x[1])
                n_plus_one_value = largest_items[-1][1]

                index = bisect(list(map(itemgetter(1), largest_items))[::-1], n_plus_one_value)

                res = dict(largest_items[:len(largest_items) - index])

                print(res)

                {'apple': 5, 'pears': 4}





                share|improve this answer




























                  1












                  1








                  1







                  As you note, filtering by the top n doesn't exclude by default all equal values which exceed the stated cap. This is by design.



                  The trick is to consider the (n+1) th highest value and ensure the values in your dictionary are all higher than this number:



                  from heapq import nlargest

                  dictation = {'apple':5, 'pears':4, 'orange':3, 'kiwi':3, 'banana':1}

                  n = 3
                  largest_items = nlargest(n+1, dictation.items(), key=lambda x: x[1])
                  n_plus_one_value = largest_items[-1][1]

                  res = {k: v for k, v in largest_items if v > n_plus_one_value}

                  print(res)

                  {'apple': 5, 'pears': 4}


                  We assume here len(largest_items) < n, otherwise you can just take the input dictionary as the result.





                  The dictionary comprehension seems expensive. For larger inputs, you can use bisect, something like:



                  from heapq import nlargest
                  from operator import itemgetter
                  from bisect import bisect

                  dictation = {'apple':5, 'pears':4, 'orange':3, 'kiwi':3, 'banana':1}

                  n = 3
                  largest_items = nlargest(n+1, dictation.items(), key=lambda x: x[1])
                  n_plus_one_value = largest_items[-1][1]

                  index = bisect(list(map(itemgetter(1), largest_items))[::-1], n_plus_one_value)

                  res = dict(largest_items[:len(largest_items) - index])

                  print(res)

                  {'apple': 5, 'pears': 4}





                  share|improve this answer















                  As you note, filtering by the top n doesn't exclude by default all equal values which exceed the stated cap. This is by design.



                  The trick is to consider the (n+1) th highest value and ensure the values in your dictionary are all higher than this number:



                  from heapq import nlargest

                  dictation = {'apple':5, 'pears':4, 'orange':3, 'kiwi':3, 'banana':1}

                  n = 3
                  largest_items = nlargest(n+1, dictation.items(), key=lambda x: x[1])
                  n_plus_one_value = largest_items[-1][1]

                  res = {k: v for k, v in largest_items if v > n_plus_one_value}

                  print(res)

                  {'apple': 5, 'pears': 4}


                  We assume here len(largest_items) < n, otherwise you can just take the input dictionary as the result.





                  The dictionary comprehension seems expensive. For larger inputs, you can use bisect, something like:



                  from heapq import nlargest
                  from operator import itemgetter
                  from bisect import bisect

                  dictation = {'apple':5, 'pears':4, 'orange':3, 'kiwi':3, 'banana':1}

                  n = 3
                  largest_items = nlargest(n+1, dictation.items(), key=lambda x: x[1])
                  n_plus_one_value = largest_items[-1][1]

                  index = bisect(list(map(itemgetter(1), largest_items))[::-1], n_plus_one_value)

                  res = dict(largest_items[:len(largest_items) - index])

                  print(res)

                  {'apple': 5, 'pears': 4}






                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited Nov 22 '18 at 2:28

























                  answered Nov 21 '18 at 19:03









                  jppjpp

                  101k2162111




                  101k2162111






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53418662%2fchoose-dictionary-keys-only-if-their-values-dont-have-a-certain-number-of-dupli%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      MongoDB - Not Authorized To Execute Command

                      in spring boot 2.1 many test slices are not allowed anymore due to multiple @BootstrapWith

                      How to fix TextFormField cause rebuild widget in Flutter