Count unique words in all text files in directory, and delete those having less than 2?












-1















This gets me the count. But how to delete those files having count < 2?



$ cat ./a1esso.doc | grep -o -E 'w+' | sort -u -f | wc --words
1
$ cat ./a1brit.doc | grep -o -E 'w+' | sort -u -f | wc --words
4


How to grab the filenames of those that have less than 2, so we may delete them? I will be scanning millions of files. A find command can find all the files, but the filename needs to be propagated through the pipeline it seems. At the right end, the rm command can be used it seems.



Thanks for reading.



Update:



The correct answer is going to use an input pipeline to feed filenames. This is not negotiable. This program is not for use on the one input file shown in the example, but is coming from a dynamic list of many files.



A filter apparatus to identify the names of the files which are meeting the criterion, will also be present in the accepted answer. This is not negotiable either.










share|improve this question





























    -1















    This gets me the count. But how to delete those files having count < 2?



    $ cat ./a1esso.doc | grep -o -E 'w+' | sort -u -f | wc --words
    1
    $ cat ./a1brit.doc | grep -o -E 'w+' | sort -u -f | wc --words
    4


    How to grab the filenames of those that have less than 2, so we may delete them? I will be scanning millions of files. A find command can find all the files, but the filename needs to be propagated through the pipeline it seems. At the right end, the rm command can be used it seems.



    Thanks for reading.



    Update:



    The correct answer is going to use an input pipeline to feed filenames. This is not negotiable. This program is not for use on the one input file shown in the example, but is coming from a dynamic list of many files.



    A filter apparatus to identify the names of the files which are meeting the criterion, will also be present in the accepted answer. This is not negotiable either.










    share|improve this question



























      -1












      -1








      -1








      This gets me the count. But how to delete those files having count < 2?



      $ cat ./a1esso.doc | grep -o -E 'w+' | sort -u -f | wc --words
      1
      $ cat ./a1brit.doc | grep -o -E 'w+' | sort -u -f | wc --words
      4


      How to grab the filenames of those that have less than 2, so we may delete them? I will be scanning millions of files. A find command can find all the files, but the filename needs to be propagated through the pipeline it seems. At the right end, the rm command can be used it seems.



      Thanks for reading.



      Update:



      The correct answer is going to use an input pipeline to feed filenames. This is not negotiable. This program is not for use on the one input file shown in the example, but is coming from a dynamic list of many files.



      A filter apparatus to identify the names of the files which are meeting the criterion, will also be present in the accepted answer. This is not negotiable either.










      share|improve this question
















      This gets me the count. But how to delete those files having count < 2?



      $ cat ./a1esso.doc | grep -o -E 'w+' | sort -u -f | wc --words
      1
      $ cat ./a1brit.doc | grep -o -E 'w+' | sort -u -f | wc --words
      4


      How to grab the filenames of those that have less than 2, so we may delete them? I will be scanning millions of files. A find command can find all the files, but the filename needs to be propagated through the pipeline it seems. At the right end, the rm command can be used it seems.



      Thanks for reading.



      Update:



      The correct answer is going to use an input pipeline to feed filenames. This is not negotiable. This program is not for use on the one input file shown in the example, but is coming from a dynamic list of many files.



      A filter apparatus to identify the names of the files which are meeting the criterion, will also be present in the accepted answer. This is not negotiable either.







      bash uniq wc






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 21 '18 at 3:07







      Geoffrey Anderson

















      asked Nov 21 '18 at 0:22









      Geoffrey AndersonGeoffrey Anderson

      574514




      574514
























          1 Answer
          1






          active

          oldest

          votes


















          0














          You could do this …



           test $(grep -o -E 'w+' ./a1esso.doc | sort -u -f | wc --words) -lt 2 && rm alesso.doc


          Update: removed useless cat as per David's comment.






          share|improve this answer





















          • 1





            cat ./a1esso.doc is an Unnecessary Use Of cat (UUOc). Instead grep -o -E 'w+' alesso.doc | ...

            – David C. Rankin
            Nov 21 '18 at 0:37











          • Yes good point.

            – Red Cricket
            Nov 21 '18 at 0:37











          • The answer cannot get chosen as written. The correct answer is going to use cat to feed filenames, as I already showed. This is not negotiable.

            – Geoffrey Anderson
            Nov 21 '18 at 3:05











          • You don't need cat to "feed filenames". grep takes a filename as an argument. cat file > grep ... is equivalent to grep … file, it is just that for former is consider bad form.

            – Red Cricket
            Nov 21 '18 at 3:19













          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53403589%2fcount-unique-words-in-all-text-files-in-directory-and-delete-those-having-less%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          0














          You could do this …



           test $(grep -o -E 'w+' ./a1esso.doc | sort -u -f | wc --words) -lt 2 && rm alesso.doc


          Update: removed useless cat as per David's comment.






          share|improve this answer





















          • 1





            cat ./a1esso.doc is an Unnecessary Use Of cat (UUOc). Instead grep -o -E 'w+' alesso.doc | ...

            – David C. Rankin
            Nov 21 '18 at 0:37











          • Yes good point.

            – Red Cricket
            Nov 21 '18 at 0:37











          • The answer cannot get chosen as written. The correct answer is going to use cat to feed filenames, as I already showed. This is not negotiable.

            – Geoffrey Anderson
            Nov 21 '18 at 3:05











          • You don't need cat to "feed filenames". grep takes a filename as an argument. cat file > grep ... is equivalent to grep … file, it is just that for former is consider bad form.

            – Red Cricket
            Nov 21 '18 at 3:19


















          0














          You could do this …



           test $(grep -o -E 'w+' ./a1esso.doc | sort -u -f | wc --words) -lt 2 && rm alesso.doc


          Update: removed useless cat as per David's comment.






          share|improve this answer





















          • 1





            cat ./a1esso.doc is an Unnecessary Use Of cat (UUOc). Instead grep -o -E 'w+' alesso.doc | ...

            – David C. Rankin
            Nov 21 '18 at 0:37











          • Yes good point.

            – Red Cricket
            Nov 21 '18 at 0:37











          • The answer cannot get chosen as written. The correct answer is going to use cat to feed filenames, as I already showed. This is not negotiable.

            – Geoffrey Anderson
            Nov 21 '18 at 3:05











          • You don't need cat to "feed filenames". grep takes a filename as an argument. cat file > grep ... is equivalent to grep … file, it is just that for former is consider bad form.

            – Red Cricket
            Nov 21 '18 at 3:19
















          0












          0








          0







          You could do this …



           test $(grep -o -E 'w+' ./a1esso.doc | sort -u -f | wc --words) -lt 2 && rm alesso.doc


          Update: removed useless cat as per David's comment.






          share|improve this answer















          You could do this …



           test $(grep -o -E 'w+' ./a1esso.doc | sort -u -f | wc --words) -lt 2 && rm alesso.doc


          Update: removed useless cat as per David's comment.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 21 '18 at 0:38

























          answered Nov 21 '18 at 0:29









          Red CricketRed Cricket

          4,404103384




          4,404103384








          • 1





            cat ./a1esso.doc is an Unnecessary Use Of cat (UUOc). Instead grep -o -E 'w+' alesso.doc | ...

            – David C. Rankin
            Nov 21 '18 at 0:37











          • Yes good point.

            – Red Cricket
            Nov 21 '18 at 0:37











          • The answer cannot get chosen as written. The correct answer is going to use cat to feed filenames, as I already showed. This is not negotiable.

            – Geoffrey Anderson
            Nov 21 '18 at 3:05











          • You don't need cat to "feed filenames". grep takes a filename as an argument. cat file > grep ... is equivalent to grep … file, it is just that for former is consider bad form.

            – Red Cricket
            Nov 21 '18 at 3:19
















          • 1





            cat ./a1esso.doc is an Unnecessary Use Of cat (UUOc). Instead grep -o -E 'w+' alesso.doc | ...

            – David C. Rankin
            Nov 21 '18 at 0:37











          • Yes good point.

            – Red Cricket
            Nov 21 '18 at 0:37











          • The answer cannot get chosen as written. The correct answer is going to use cat to feed filenames, as I already showed. This is not negotiable.

            – Geoffrey Anderson
            Nov 21 '18 at 3:05











          • You don't need cat to "feed filenames". grep takes a filename as an argument. cat file > grep ... is equivalent to grep … file, it is just that for former is consider bad form.

            – Red Cricket
            Nov 21 '18 at 3:19










          1




          1





          cat ./a1esso.doc is an Unnecessary Use Of cat (UUOc). Instead grep -o -E 'w+' alesso.doc | ...

          – David C. Rankin
          Nov 21 '18 at 0:37





          cat ./a1esso.doc is an Unnecessary Use Of cat (UUOc). Instead grep -o -E 'w+' alesso.doc | ...

          – David C. Rankin
          Nov 21 '18 at 0:37













          Yes good point.

          – Red Cricket
          Nov 21 '18 at 0:37





          Yes good point.

          – Red Cricket
          Nov 21 '18 at 0:37













          The answer cannot get chosen as written. The correct answer is going to use cat to feed filenames, as I already showed. This is not negotiable.

          – Geoffrey Anderson
          Nov 21 '18 at 3:05





          The answer cannot get chosen as written. The correct answer is going to use cat to feed filenames, as I already showed. This is not negotiable.

          – Geoffrey Anderson
          Nov 21 '18 at 3:05













          You don't need cat to "feed filenames". grep takes a filename as an argument. cat file > grep ... is equivalent to grep … file, it is just that for former is consider bad form.

          – Red Cricket
          Nov 21 '18 at 3:19







          You don't need cat to "feed filenames". grep takes a filename as an argument. cat file > grep ... is equivalent to grep … file, it is just that for former is consider bad form.

          – Red Cricket
          Nov 21 '18 at 3:19




















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53403589%2fcount-unique-words-in-all-text-files-in-directory-and-delete-those-having-less%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          MongoDB - Not Authorized To Execute Command

          Npm cannot find a required file even through it is in the searched directory

          in spring boot 2.1 many test slices are not allowed anymore due to multiple @BootstrapWith