Count unique words in all text files in directory, and delete those having less than 2?
This gets me the count. But how to delete those files having count < 2?
$ cat ./a1esso.doc | grep -o -E 'w+' | sort -u -f | wc --words
1
$ cat ./a1brit.doc | grep -o -E 'w+' | sort -u -f | wc --words
4
How to grab the filenames of those that have less than 2, so we may delete them? I will be scanning millions of files. A find command can find all the files, but the filename needs to be propagated through the pipeline it seems. At the right end, the rm command can be used it seems.
Thanks for reading.
Update:
The correct answer is going to use an input pipeline to feed filenames. This is not negotiable. This program is not for use on the one input file shown in the example, but is coming from a dynamic list of many files.
A filter apparatus to identify the names of the files which are meeting the criterion, will also be present in the accepted answer. This is not negotiable either.
bash uniq wc
add a comment |
This gets me the count. But how to delete those files having count < 2?
$ cat ./a1esso.doc | grep -o -E 'w+' | sort -u -f | wc --words
1
$ cat ./a1brit.doc | grep -o -E 'w+' | sort -u -f | wc --words
4
How to grab the filenames of those that have less than 2, so we may delete them? I will be scanning millions of files. A find command can find all the files, but the filename needs to be propagated through the pipeline it seems. At the right end, the rm command can be used it seems.
Thanks for reading.
Update:
The correct answer is going to use an input pipeline to feed filenames. This is not negotiable. This program is not for use on the one input file shown in the example, but is coming from a dynamic list of many files.
A filter apparatus to identify the names of the files which are meeting the criterion, will also be present in the accepted answer. This is not negotiable either.
bash uniq wc
add a comment |
This gets me the count. But how to delete those files having count < 2?
$ cat ./a1esso.doc | grep -o -E 'w+' | sort -u -f | wc --words
1
$ cat ./a1brit.doc | grep -o -E 'w+' | sort -u -f | wc --words
4
How to grab the filenames of those that have less than 2, so we may delete them? I will be scanning millions of files. A find command can find all the files, but the filename needs to be propagated through the pipeline it seems. At the right end, the rm command can be used it seems.
Thanks for reading.
Update:
The correct answer is going to use an input pipeline to feed filenames. This is not negotiable. This program is not for use on the one input file shown in the example, but is coming from a dynamic list of many files.
A filter apparatus to identify the names of the files which are meeting the criterion, will also be present in the accepted answer. This is not negotiable either.
bash uniq wc
This gets me the count. But how to delete those files having count < 2?
$ cat ./a1esso.doc | grep -o -E 'w+' | sort -u -f | wc --words
1
$ cat ./a1brit.doc | grep -o -E 'w+' | sort -u -f | wc --words
4
How to grab the filenames of those that have less than 2, so we may delete them? I will be scanning millions of files. A find command can find all the files, but the filename needs to be propagated through the pipeline it seems. At the right end, the rm command can be used it seems.
Thanks for reading.
Update:
The correct answer is going to use an input pipeline to feed filenames. This is not negotiable. This program is not for use on the one input file shown in the example, but is coming from a dynamic list of many files.
A filter apparatus to identify the names of the files which are meeting the criterion, will also be present in the accepted answer. This is not negotiable either.
bash uniq wc
bash uniq wc
edited Nov 21 '18 at 3:07
Geoffrey Anderson
asked Nov 21 '18 at 0:22
Geoffrey AndersonGeoffrey Anderson
574514
574514
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
You could do this …
test $(grep -o -E 'w+' ./a1esso.doc | sort -u -f | wc --words) -lt 2 && rm alesso.doc
Update: removed useless cat as per David's comment.
1
cat ./a1esso.doc
is an Unnecessary Use Ofcat
(UUOc). Insteadgrep -o -E 'w+' alesso.doc | ...
– David C. Rankin
Nov 21 '18 at 0:37
Yes good point.
– Red Cricket
Nov 21 '18 at 0:37
The answer cannot get chosen as written. The correct answer is going to use cat to feed filenames, as I already showed. This is not negotiable.
– Geoffrey Anderson
Nov 21 '18 at 3:05
You don't needcat
to "feed filenames".grep
takes a filename as an argument.cat file > grep ...
is equivalent togrep … file
, it is just that for former is consider bad form.
– Red Cricket
Nov 21 '18 at 3:19
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53403589%2fcount-unique-words-in-all-text-files-in-directory-and-delete-those-having-less%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
You could do this …
test $(grep -o -E 'w+' ./a1esso.doc | sort -u -f | wc --words) -lt 2 && rm alesso.doc
Update: removed useless cat as per David's comment.
1
cat ./a1esso.doc
is an Unnecessary Use Ofcat
(UUOc). Insteadgrep -o -E 'w+' alesso.doc | ...
– David C. Rankin
Nov 21 '18 at 0:37
Yes good point.
– Red Cricket
Nov 21 '18 at 0:37
The answer cannot get chosen as written. The correct answer is going to use cat to feed filenames, as I already showed. This is not negotiable.
– Geoffrey Anderson
Nov 21 '18 at 3:05
You don't needcat
to "feed filenames".grep
takes a filename as an argument.cat file > grep ...
is equivalent togrep … file
, it is just that for former is consider bad form.
– Red Cricket
Nov 21 '18 at 3:19
add a comment |
You could do this …
test $(grep -o -E 'w+' ./a1esso.doc | sort -u -f | wc --words) -lt 2 && rm alesso.doc
Update: removed useless cat as per David's comment.
1
cat ./a1esso.doc
is an Unnecessary Use Ofcat
(UUOc). Insteadgrep -o -E 'w+' alesso.doc | ...
– David C. Rankin
Nov 21 '18 at 0:37
Yes good point.
– Red Cricket
Nov 21 '18 at 0:37
The answer cannot get chosen as written. The correct answer is going to use cat to feed filenames, as I already showed. This is not negotiable.
– Geoffrey Anderson
Nov 21 '18 at 3:05
You don't needcat
to "feed filenames".grep
takes a filename as an argument.cat file > grep ...
is equivalent togrep … file
, it is just that for former is consider bad form.
– Red Cricket
Nov 21 '18 at 3:19
add a comment |
You could do this …
test $(grep -o -E 'w+' ./a1esso.doc | sort -u -f | wc --words) -lt 2 && rm alesso.doc
Update: removed useless cat as per David's comment.
You could do this …
test $(grep -o -E 'w+' ./a1esso.doc | sort -u -f | wc --words) -lt 2 && rm alesso.doc
Update: removed useless cat as per David's comment.
edited Nov 21 '18 at 0:38
answered Nov 21 '18 at 0:29


Red CricketRed Cricket
4,404103384
4,404103384
1
cat ./a1esso.doc
is an Unnecessary Use Ofcat
(UUOc). Insteadgrep -o -E 'w+' alesso.doc | ...
– David C. Rankin
Nov 21 '18 at 0:37
Yes good point.
– Red Cricket
Nov 21 '18 at 0:37
The answer cannot get chosen as written. The correct answer is going to use cat to feed filenames, as I already showed. This is not negotiable.
– Geoffrey Anderson
Nov 21 '18 at 3:05
You don't needcat
to "feed filenames".grep
takes a filename as an argument.cat file > grep ...
is equivalent togrep … file
, it is just that for former is consider bad form.
– Red Cricket
Nov 21 '18 at 3:19
add a comment |
1
cat ./a1esso.doc
is an Unnecessary Use Ofcat
(UUOc). Insteadgrep -o -E 'w+' alesso.doc | ...
– David C. Rankin
Nov 21 '18 at 0:37
Yes good point.
– Red Cricket
Nov 21 '18 at 0:37
The answer cannot get chosen as written. The correct answer is going to use cat to feed filenames, as I already showed. This is not negotiable.
– Geoffrey Anderson
Nov 21 '18 at 3:05
You don't needcat
to "feed filenames".grep
takes a filename as an argument.cat file > grep ...
is equivalent togrep … file
, it is just that for former is consider bad form.
– Red Cricket
Nov 21 '18 at 3:19
1
1
cat ./a1esso.doc
is an Unnecessary Use Of cat
(UUOc). Instead grep -o -E 'w+' alesso.doc | ...
– David C. Rankin
Nov 21 '18 at 0:37
cat ./a1esso.doc
is an Unnecessary Use Of cat
(UUOc). Instead grep -o -E 'w+' alesso.doc | ...
– David C. Rankin
Nov 21 '18 at 0:37
Yes good point.
– Red Cricket
Nov 21 '18 at 0:37
Yes good point.
– Red Cricket
Nov 21 '18 at 0:37
The answer cannot get chosen as written. The correct answer is going to use cat to feed filenames, as I already showed. This is not negotiable.
– Geoffrey Anderson
Nov 21 '18 at 3:05
The answer cannot get chosen as written. The correct answer is going to use cat to feed filenames, as I already showed. This is not negotiable.
– Geoffrey Anderson
Nov 21 '18 at 3:05
You don't need
cat
to "feed filenames". grep
takes a filename as an argument. cat file > grep ...
is equivalent to grep … file
, it is just that for former is consider bad form.– Red Cricket
Nov 21 '18 at 3:19
You don't need
cat
to "feed filenames". grep
takes a filename as an argument. cat file > grep ...
is equivalent to grep … file
, it is just that for former is consider bad form.– Red Cricket
Nov 21 '18 at 3:19
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53403589%2fcount-unique-words-in-all-text-files-in-directory-and-delete-those-having-less%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown