Regex and sed - remove everything in a text file but filenames
I'm trying to clean a list files text. Here's a sample :
15Tlb3Bsn5ec71Os6paEyTpf-5YkTsjwo CNEWS-2018-12-01_07-00-00h.mp4 bin 1.5 GB 2018-12-03 16:03:00
1irhwA-tcExWXs-ksyOQuEBYL-LDktMQB franceinfo-2018-12-01_06-30-00h.mp4 bin 949.2 MB 2018-12-03 18:43:10
1UEjtEtU27gMA-Bf7J1rTVhFn9D5z0Rjb LCI-2018-12-01_06-00-00h.mp4 bin 908.2 MB 2018-12-03 17:30:11
1_ouEY6Ugg8h_XvzjE4j4m751o3eMNxhh BFMTV-2018-12-01_05-30-00h.mp4 bin 1.2 GB 2018-12-03 14:33:25
1f7JWvb6PM9PRhFimXKc8k81qiTVKwe-e franceinfo-2018-12-01_04-30-00h.mp4 bin 1.0 GB 2018-12-03 18:43:36
1nKzPZw6tKNzErmWdwbq8f-47DSF4cQbt BFMTV-2018-12-01_03-30-00h.mp4 bin 1.2 GB 2018-12-03 14:33:03
So I think that this expression might work :
([A-z])*(-)(d{4})(-)(d{2})(-)(d{2})_(d{2})-(d{2})-(d{2}h)(.)(mp4)
But I've tried a lot of sed command, like :
sed -n -E 's/([A-z])*(-)(d{4})(-)(d{2})(-)(d{2})_(d{2})-(d{2})-(d{2}h)(.)(mp4)/2/p' /media/partage/v2/backupGdriveListOnline.txt
And nothing seems to work.
Is it the right command output online the filenames ?
regex bash
add a comment |
I'm trying to clean a list files text. Here's a sample :
15Tlb3Bsn5ec71Os6paEyTpf-5YkTsjwo CNEWS-2018-12-01_07-00-00h.mp4 bin 1.5 GB 2018-12-03 16:03:00
1irhwA-tcExWXs-ksyOQuEBYL-LDktMQB franceinfo-2018-12-01_06-30-00h.mp4 bin 949.2 MB 2018-12-03 18:43:10
1UEjtEtU27gMA-Bf7J1rTVhFn9D5z0Rjb LCI-2018-12-01_06-00-00h.mp4 bin 908.2 MB 2018-12-03 17:30:11
1_ouEY6Ugg8h_XvzjE4j4m751o3eMNxhh BFMTV-2018-12-01_05-30-00h.mp4 bin 1.2 GB 2018-12-03 14:33:25
1f7JWvb6PM9PRhFimXKc8k81qiTVKwe-e franceinfo-2018-12-01_04-30-00h.mp4 bin 1.0 GB 2018-12-03 18:43:36
1nKzPZw6tKNzErmWdwbq8f-47DSF4cQbt BFMTV-2018-12-01_03-30-00h.mp4 bin 1.2 GB 2018-12-03 14:33:03
So I think that this expression might work :
([A-z])*(-)(d{4})(-)(d{2})(-)(d{2})_(d{2})-(d{2})-(d{2}h)(.)(mp4)
But I've tried a lot of sed command, like :
sed -n -E 's/([A-z])*(-)(d{4})(-)(d{2})(-)(d{2})_(d{2})-(d{2})-(d{2}h)(.)(mp4)/2/p' /media/partage/v2/backupGdriveListOnline.txt
And nothing seems to work.
Is it the right command output online the filenames ?
regex bash
3
If the filenames don't have whitespaces in them, you could doawk '{print $2}' file
.
– mickp
Jan 2 at 16:09
$2 for 2=´([A-z])*(-)(d{4})(-)(d{2})(-)(d{2})_(d{2})-(d{2})-(d{2}h)(.)(mp4)´ ?
– petaire
Jan 2 at 16:22
(Oh ok, just $2)
– petaire
Jan 2 at 16:24
@petaire $2 means second column.
– Tiw
Jan 2 at 16:24
add a comment |
I'm trying to clean a list files text. Here's a sample :
15Tlb3Bsn5ec71Os6paEyTpf-5YkTsjwo CNEWS-2018-12-01_07-00-00h.mp4 bin 1.5 GB 2018-12-03 16:03:00
1irhwA-tcExWXs-ksyOQuEBYL-LDktMQB franceinfo-2018-12-01_06-30-00h.mp4 bin 949.2 MB 2018-12-03 18:43:10
1UEjtEtU27gMA-Bf7J1rTVhFn9D5z0Rjb LCI-2018-12-01_06-00-00h.mp4 bin 908.2 MB 2018-12-03 17:30:11
1_ouEY6Ugg8h_XvzjE4j4m751o3eMNxhh BFMTV-2018-12-01_05-30-00h.mp4 bin 1.2 GB 2018-12-03 14:33:25
1f7JWvb6PM9PRhFimXKc8k81qiTVKwe-e franceinfo-2018-12-01_04-30-00h.mp4 bin 1.0 GB 2018-12-03 18:43:36
1nKzPZw6tKNzErmWdwbq8f-47DSF4cQbt BFMTV-2018-12-01_03-30-00h.mp4 bin 1.2 GB 2018-12-03 14:33:03
So I think that this expression might work :
([A-z])*(-)(d{4})(-)(d{2})(-)(d{2})_(d{2})-(d{2})-(d{2}h)(.)(mp4)
But I've tried a lot of sed command, like :
sed -n -E 's/([A-z])*(-)(d{4})(-)(d{2})(-)(d{2})_(d{2})-(d{2})-(d{2}h)(.)(mp4)/2/p' /media/partage/v2/backupGdriveListOnline.txt
And nothing seems to work.
Is it the right command output online the filenames ?
regex bash
I'm trying to clean a list files text. Here's a sample :
15Tlb3Bsn5ec71Os6paEyTpf-5YkTsjwo CNEWS-2018-12-01_07-00-00h.mp4 bin 1.5 GB 2018-12-03 16:03:00
1irhwA-tcExWXs-ksyOQuEBYL-LDktMQB franceinfo-2018-12-01_06-30-00h.mp4 bin 949.2 MB 2018-12-03 18:43:10
1UEjtEtU27gMA-Bf7J1rTVhFn9D5z0Rjb LCI-2018-12-01_06-00-00h.mp4 bin 908.2 MB 2018-12-03 17:30:11
1_ouEY6Ugg8h_XvzjE4j4m751o3eMNxhh BFMTV-2018-12-01_05-30-00h.mp4 bin 1.2 GB 2018-12-03 14:33:25
1f7JWvb6PM9PRhFimXKc8k81qiTVKwe-e franceinfo-2018-12-01_04-30-00h.mp4 bin 1.0 GB 2018-12-03 18:43:36
1nKzPZw6tKNzErmWdwbq8f-47DSF4cQbt BFMTV-2018-12-01_03-30-00h.mp4 bin 1.2 GB 2018-12-03 14:33:03
So I think that this expression might work :
([A-z])*(-)(d{4})(-)(d{2})(-)(d{2})_(d{2})-(d{2})-(d{2}h)(.)(mp4)
But I've tried a lot of sed command, like :
sed -n -E 's/([A-z])*(-)(d{4})(-)(d{2})(-)(d{2})_(d{2})-(d{2})-(d{2}h)(.)(mp4)/2/p' /media/partage/v2/backupGdriveListOnline.txt
And nothing seems to work.
Is it the right command output online the filenames ?
regex bash
regex bash
asked Jan 2 at 16:04
petairepetaire
307219
307219
3
If the filenames don't have whitespaces in them, you could doawk '{print $2}' file
.
– mickp
Jan 2 at 16:09
$2 for 2=´([A-z])*(-)(d{4})(-)(d{2})(-)(d{2})_(d{2})-(d{2})-(d{2}h)(.)(mp4)´ ?
– petaire
Jan 2 at 16:22
(Oh ok, just $2)
– petaire
Jan 2 at 16:24
@petaire $2 means second column.
– Tiw
Jan 2 at 16:24
add a comment |
3
If the filenames don't have whitespaces in them, you could doawk '{print $2}' file
.
– mickp
Jan 2 at 16:09
$2 for 2=´([A-z])*(-)(d{4})(-)(d{2})(-)(d{2})_(d{2})-(d{2})-(d{2}h)(.)(mp4)´ ?
– petaire
Jan 2 at 16:22
(Oh ok, just $2)
– petaire
Jan 2 at 16:24
@petaire $2 means second column.
– Tiw
Jan 2 at 16:24
3
3
If the filenames don't have whitespaces in them, you could do
awk '{print $2}' file
.– mickp
Jan 2 at 16:09
If the filenames don't have whitespaces in them, you could do
awk '{print $2}' file
.– mickp
Jan 2 at 16:09
$2 for 2=´([A-z])*(-)(d{4})(-)(d{2})(-)(d{2})_(d{2})-(d{2})-(d{2}h)(.)(mp4)´ ?
– petaire
Jan 2 at 16:22
$2 for 2=´([A-z])*(-)(d{4})(-)(d{2})(-)(d{2})_(d{2})-(d{2})-(d{2}h)(.)(mp4)´ ?
– petaire
Jan 2 at 16:22
(Oh ok, just $2)
– petaire
Jan 2 at 16:24
(Oh ok, just $2)
– petaire
Jan 2 at 16:24
@petaire $2 means second column.
– Tiw
Jan 2 at 16:24
@petaire $2 means second column.
– Tiw
Jan 2 at 16:24
add a comment |
1 Answer
1
active
oldest
votes
sed
does not support some regex functionalities.
Try grep
:
grep -ioP '([A-Z])*(-)(d{4})(-)(d{2})(-)(d{2})_(d{2})-(d{2})-(d{2}h)(.)(mp4)' text
Output:
CNEWS-2018-12-01_07-00-00h.mp4
franceinfo-2018-12-01_06-30-00h.mp4
LCI-2018-12-01_06-00-00h.mp4
BFMTV-2018-12-01_05-30-00h.mp4
franceinfo-2018-12-01_04-30-00h.mp4
BFMTV-2018-12-01_03-30-00h.mp4
Also you have a typo in your regex, [A-z]
should be [A-Z]
.-i
, --ignore-case
ignore case distinctions-o
, --only-matching
show only the part of a line matching PATTERN-P
, --perl-regexp
PATTERN is a Perl regular expression
I can see you put big efforts to your regex, so I suggested this one.
However, apart from awk's clean print $2
way, you can use sed to really clean
other things too:
sed -E 's/^[^ t]*[ t]+//;s/(.mp4).*/1/' text
It's to remove everything from line beginning to spaces(include),
and remove everything after .mp4
Is it oversimplification to just usesed -E 's/^[^ t]+[ t]+([^ t]+).*/1/'
?
– Paul Hodges
Jan 2 at 19:37
Nice ! Thank you everyone !
– petaire
Jan 2 at 20:31
@PaulHodges Might be, if the Filenames contain space then it will fail that way. Otherwise it's okay.
– Tiw
Jan 3 at 3:11
1
True, though I was under the impression spaces were being used as a file delimiter, which would be a bad plan if they have embedded spaces. :)
– Paul Hodges
Jan 3 at 14:13
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54009469%2fregex-and-sed-remove-everything-in-a-text-file-but-filenames%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
sed
does not support some regex functionalities.
Try grep
:
grep -ioP '([A-Z])*(-)(d{4})(-)(d{2})(-)(d{2})_(d{2})-(d{2})-(d{2}h)(.)(mp4)' text
Output:
CNEWS-2018-12-01_07-00-00h.mp4
franceinfo-2018-12-01_06-30-00h.mp4
LCI-2018-12-01_06-00-00h.mp4
BFMTV-2018-12-01_05-30-00h.mp4
franceinfo-2018-12-01_04-30-00h.mp4
BFMTV-2018-12-01_03-30-00h.mp4
Also you have a typo in your regex, [A-z]
should be [A-Z]
.-i
, --ignore-case
ignore case distinctions-o
, --only-matching
show only the part of a line matching PATTERN-P
, --perl-regexp
PATTERN is a Perl regular expression
I can see you put big efforts to your regex, so I suggested this one.
However, apart from awk's clean print $2
way, you can use sed to really clean
other things too:
sed -E 's/^[^ t]*[ t]+//;s/(.mp4).*/1/' text
It's to remove everything from line beginning to spaces(include),
and remove everything after .mp4
Is it oversimplification to just usesed -E 's/^[^ t]+[ t]+([^ t]+).*/1/'
?
– Paul Hodges
Jan 2 at 19:37
Nice ! Thank you everyone !
– petaire
Jan 2 at 20:31
@PaulHodges Might be, if the Filenames contain space then it will fail that way. Otherwise it's okay.
– Tiw
Jan 3 at 3:11
1
True, though I was under the impression spaces were being used as a file delimiter, which would be a bad plan if they have embedded spaces. :)
– Paul Hodges
Jan 3 at 14:13
add a comment |
sed
does not support some regex functionalities.
Try grep
:
grep -ioP '([A-Z])*(-)(d{4})(-)(d{2})(-)(d{2})_(d{2})-(d{2})-(d{2}h)(.)(mp4)' text
Output:
CNEWS-2018-12-01_07-00-00h.mp4
franceinfo-2018-12-01_06-30-00h.mp4
LCI-2018-12-01_06-00-00h.mp4
BFMTV-2018-12-01_05-30-00h.mp4
franceinfo-2018-12-01_04-30-00h.mp4
BFMTV-2018-12-01_03-30-00h.mp4
Also you have a typo in your regex, [A-z]
should be [A-Z]
.-i
, --ignore-case
ignore case distinctions-o
, --only-matching
show only the part of a line matching PATTERN-P
, --perl-regexp
PATTERN is a Perl regular expression
I can see you put big efforts to your regex, so I suggested this one.
However, apart from awk's clean print $2
way, you can use sed to really clean
other things too:
sed -E 's/^[^ t]*[ t]+//;s/(.mp4).*/1/' text
It's to remove everything from line beginning to spaces(include),
and remove everything after .mp4
Is it oversimplification to just usesed -E 's/^[^ t]+[ t]+([^ t]+).*/1/'
?
– Paul Hodges
Jan 2 at 19:37
Nice ! Thank you everyone !
– petaire
Jan 2 at 20:31
@PaulHodges Might be, if the Filenames contain space then it will fail that way. Otherwise it's okay.
– Tiw
Jan 3 at 3:11
1
True, though I was under the impression spaces were being used as a file delimiter, which would be a bad plan if they have embedded spaces. :)
– Paul Hodges
Jan 3 at 14:13
add a comment |
sed
does not support some regex functionalities.
Try grep
:
grep -ioP '([A-Z])*(-)(d{4})(-)(d{2})(-)(d{2})_(d{2})-(d{2})-(d{2}h)(.)(mp4)' text
Output:
CNEWS-2018-12-01_07-00-00h.mp4
franceinfo-2018-12-01_06-30-00h.mp4
LCI-2018-12-01_06-00-00h.mp4
BFMTV-2018-12-01_05-30-00h.mp4
franceinfo-2018-12-01_04-30-00h.mp4
BFMTV-2018-12-01_03-30-00h.mp4
Also you have a typo in your regex, [A-z]
should be [A-Z]
.-i
, --ignore-case
ignore case distinctions-o
, --only-matching
show only the part of a line matching PATTERN-P
, --perl-regexp
PATTERN is a Perl regular expression
I can see you put big efforts to your regex, so I suggested this one.
However, apart from awk's clean print $2
way, you can use sed to really clean
other things too:
sed -E 's/^[^ t]*[ t]+//;s/(.mp4).*/1/' text
It's to remove everything from line beginning to spaces(include),
and remove everything after .mp4
sed
does not support some regex functionalities.
Try grep
:
grep -ioP '([A-Z])*(-)(d{4})(-)(d{2})(-)(d{2})_(d{2})-(d{2})-(d{2}h)(.)(mp4)' text
Output:
CNEWS-2018-12-01_07-00-00h.mp4
franceinfo-2018-12-01_06-30-00h.mp4
LCI-2018-12-01_06-00-00h.mp4
BFMTV-2018-12-01_05-30-00h.mp4
franceinfo-2018-12-01_04-30-00h.mp4
BFMTV-2018-12-01_03-30-00h.mp4
Also you have a typo in your regex, [A-z]
should be [A-Z]
.-i
, --ignore-case
ignore case distinctions-o
, --only-matching
show only the part of a line matching PATTERN-P
, --perl-regexp
PATTERN is a Perl regular expression
I can see you put big efforts to your regex, so I suggested this one.
However, apart from awk's clean print $2
way, you can use sed to really clean
other things too:
sed -E 's/^[^ t]*[ t]+//;s/(.mp4).*/1/' text
It's to remove everything from line beginning to spaces(include),
and remove everything after .mp4
edited Jan 2 at 16:29
answered Jan 2 at 16:21


TiwTiw
4,28361630
4,28361630
Is it oversimplification to just usesed -E 's/^[^ t]+[ t]+([^ t]+).*/1/'
?
– Paul Hodges
Jan 2 at 19:37
Nice ! Thank you everyone !
– petaire
Jan 2 at 20:31
@PaulHodges Might be, if the Filenames contain space then it will fail that way. Otherwise it's okay.
– Tiw
Jan 3 at 3:11
1
True, though I was under the impression spaces were being used as a file delimiter, which would be a bad plan if they have embedded spaces. :)
– Paul Hodges
Jan 3 at 14:13
add a comment |
Is it oversimplification to just usesed -E 's/^[^ t]+[ t]+([^ t]+).*/1/'
?
– Paul Hodges
Jan 2 at 19:37
Nice ! Thank you everyone !
– petaire
Jan 2 at 20:31
@PaulHodges Might be, if the Filenames contain space then it will fail that way. Otherwise it's okay.
– Tiw
Jan 3 at 3:11
1
True, though I was under the impression spaces were being used as a file delimiter, which would be a bad plan if they have embedded spaces. :)
– Paul Hodges
Jan 3 at 14:13
Is it oversimplification to just use
sed -E 's/^[^ t]+[ t]+([^ t]+).*/1/'
?– Paul Hodges
Jan 2 at 19:37
Is it oversimplification to just use
sed -E 's/^[^ t]+[ t]+([^ t]+).*/1/'
?– Paul Hodges
Jan 2 at 19:37
Nice ! Thank you everyone !
– petaire
Jan 2 at 20:31
Nice ! Thank you everyone !
– petaire
Jan 2 at 20:31
@PaulHodges Might be, if the Filenames contain space then it will fail that way. Otherwise it's okay.
– Tiw
Jan 3 at 3:11
@PaulHodges Might be, if the Filenames contain space then it will fail that way. Otherwise it's okay.
– Tiw
Jan 3 at 3:11
1
1
True, though I was under the impression spaces were being used as a file delimiter, which would be a bad plan if they have embedded spaces. :)
– Paul Hodges
Jan 3 at 14:13
True, though I was under the impression spaces were being used as a file delimiter, which would be a bad plan if they have embedded spaces. :)
– Paul Hodges
Jan 3 at 14:13
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54009469%2fregex-and-sed-remove-everything-in-a-text-file-but-filenames%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
3
If the filenames don't have whitespaces in them, you could do
awk '{print $2}' file
.– mickp
Jan 2 at 16:09
$2 for 2=´([A-z])*(-)(d{4})(-)(d{2})(-)(d{2})_(d{2})-(d{2})-(d{2}h)(.)(mp4)´ ?
– petaire
Jan 2 at 16:22
(Oh ok, just $2)
– petaire
Jan 2 at 16:24
@petaire $2 means second column.
– Tiw
Jan 2 at 16:24