Regex and sed - remove everything in a text file but filenames












0















I'm trying to clean a list files text. Here's a sample :



15Tlb3Bsn5ec71Os6paEyTpf-5YkTsjwo   CNEWS-2018-12-01_07-00-00h.mp4             bin    1.5 GB     2018-12-03 16:03:00
1irhwA-tcExWXs-ksyOQuEBYL-LDktMQB franceinfo-2018-12-01_06-30-00h.mp4 bin 949.2 MB 2018-12-03 18:43:10
1UEjtEtU27gMA-Bf7J1rTVhFn9D5z0Rjb LCI-2018-12-01_06-00-00h.mp4 bin 908.2 MB 2018-12-03 17:30:11
1_ouEY6Ugg8h_XvzjE4j4m751o3eMNxhh BFMTV-2018-12-01_05-30-00h.mp4 bin 1.2 GB 2018-12-03 14:33:25
1f7JWvb6PM9PRhFimXKc8k81qiTVKwe-e franceinfo-2018-12-01_04-30-00h.mp4 bin 1.0 GB 2018-12-03 18:43:36
1nKzPZw6tKNzErmWdwbq8f-47DSF4cQbt BFMTV-2018-12-01_03-30-00h.mp4 bin 1.2 GB 2018-12-03 14:33:03


So I think that this expression might work :



([A-z])*(-)(d{4})(-)(d{2})(-)(d{2})_(d{2})-(d{2})-(d{2}h)(.)(mp4)


But I've tried a lot of sed command, like :



sed -n -E 's/([A-z])*(-)(d{4})(-)(d{2})(-)(d{2})_(d{2})-(d{2})-(d{2}h)(.)(mp4)/2/p' /media/partage/v2/backupGdriveListOnline.txt


And nothing seems to work.



Is it the right command output online the filenames ?










share|improve this question


















  • 3





    If the filenames don't have whitespaces in them, you could do awk '{print $2}' file.

    – mickp
    Jan 2 at 16:09













  • $2 for 2=´([A-z])*(-)(d{4})(-)(d{2})(-)(d{2})_(d{2})-(d{2})-(d{2}h)(.)(mp4)´ ?

    – petaire
    Jan 2 at 16:22











  • (Oh ok, just $2)

    – petaire
    Jan 2 at 16:24











  • @petaire $2 means second column.

    – Tiw
    Jan 2 at 16:24
















0















I'm trying to clean a list files text. Here's a sample :



15Tlb3Bsn5ec71Os6paEyTpf-5YkTsjwo   CNEWS-2018-12-01_07-00-00h.mp4             bin    1.5 GB     2018-12-03 16:03:00
1irhwA-tcExWXs-ksyOQuEBYL-LDktMQB franceinfo-2018-12-01_06-30-00h.mp4 bin 949.2 MB 2018-12-03 18:43:10
1UEjtEtU27gMA-Bf7J1rTVhFn9D5z0Rjb LCI-2018-12-01_06-00-00h.mp4 bin 908.2 MB 2018-12-03 17:30:11
1_ouEY6Ugg8h_XvzjE4j4m751o3eMNxhh BFMTV-2018-12-01_05-30-00h.mp4 bin 1.2 GB 2018-12-03 14:33:25
1f7JWvb6PM9PRhFimXKc8k81qiTVKwe-e franceinfo-2018-12-01_04-30-00h.mp4 bin 1.0 GB 2018-12-03 18:43:36
1nKzPZw6tKNzErmWdwbq8f-47DSF4cQbt BFMTV-2018-12-01_03-30-00h.mp4 bin 1.2 GB 2018-12-03 14:33:03


So I think that this expression might work :



([A-z])*(-)(d{4})(-)(d{2})(-)(d{2})_(d{2})-(d{2})-(d{2}h)(.)(mp4)


But I've tried a lot of sed command, like :



sed -n -E 's/([A-z])*(-)(d{4})(-)(d{2})(-)(d{2})_(d{2})-(d{2})-(d{2}h)(.)(mp4)/2/p' /media/partage/v2/backupGdriveListOnline.txt


And nothing seems to work.



Is it the right command output online the filenames ?










share|improve this question


















  • 3





    If the filenames don't have whitespaces in them, you could do awk '{print $2}' file.

    – mickp
    Jan 2 at 16:09













  • $2 for 2=´([A-z])*(-)(d{4})(-)(d{2})(-)(d{2})_(d{2})-(d{2})-(d{2}h)(.)(mp4)´ ?

    – petaire
    Jan 2 at 16:22











  • (Oh ok, just $2)

    – petaire
    Jan 2 at 16:24











  • @petaire $2 means second column.

    – Tiw
    Jan 2 at 16:24














0












0








0








I'm trying to clean a list files text. Here's a sample :



15Tlb3Bsn5ec71Os6paEyTpf-5YkTsjwo   CNEWS-2018-12-01_07-00-00h.mp4             bin    1.5 GB     2018-12-03 16:03:00
1irhwA-tcExWXs-ksyOQuEBYL-LDktMQB franceinfo-2018-12-01_06-30-00h.mp4 bin 949.2 MB 2018-12-03 18:43:10
1UEjtEtU27gMA-Bf7J1rTVhFn9D5z0Rjb LCI-2018-12-01_06-00-00h.mp4 bin 908.2 MB 2018-12-03 17:30:11
1_ouEY6Ugg8h_XvzjE4j4m751o3eMNxhh BFMTV-2018-12-01_05-30-00h.mp4 bin 1.2 GB 2018-12-03 14:33:25
1f7JWvb6PM9PRhFimXKc8k81qiTVKwe-e franceinfo-2018-12-01_04-30-00h.mp4 bin 1.0 GB 2018-12-03 18:43:36
1nKzPZw6tKNzErmWdwbq8f-47DSF4cQbt BFMTV-2018-12-01_03-30-00h.mp4 bin 1.2 GB 2018-12-03 14:33:03


So I think that this expression might work :



([A-z])*(-)(d{4})(-)(d{2})(-)(d{2})_(d{2})-(d{2})-(d{2}h)(.)(mp4)


But I've tried a lot of sed command, like :



sed -n -E 's/([A-z])*(-)(d{4})(-)(d{2})(-)(d{2})_(d{2})-(d{2})-(d{2}h)(.)(mp4)/2/p' /media/partage/v2/backupGdriveListOnline.txt


And nothing seems to work.



Is it the right command output online the filenames ?










share|improve this question














I'm trying to clean a list files text. Here's a sample :



15Tlb3Bsn5ec71Os6paEyTpf-5YkTsjwo   CNEWS-2018-12-01_07-00-00h.mp4             bin    1.5 GB     2018-12-03 16:03:00
1irhwA-tcExWXs-ksyOQuEBYL-LDktMQB franceinfo-2018-12-01_06-30-00h.mp4 bin 949.2 MB 2018-12-03 18:43:10
1UEjtEtU27gMA-Bf7J1rTVhFn9D5z0Rjb LCI-2018-12-01_06-00-00h.mp4 bin 908.2 MB 2018-12-03 17:30:11
1_ouEY6Ugg8h_XvzjE4j4m751o3eMNxhh BFMTV-2018-12-01_05-30-00h.mp4 bin 1.2 GB 2018-12-03 14:33:25
1f7JWvb6PM9PRhFimXKc8k81qiTVKwe-e franceinfo-2018-12-01_04-30-00h.mp4 bin 1.0 GB 2018-12-03 18:43:36
1nKzPZw6tKNzErmWdwbq8f-47DSF4cQbt BFMTV-2018-12-01_03-30-00h.mp4 bin 1.2 GB 2018-12-03 14:33:03


So I think that this expression might work :



([A-z])*(-)(d{4})(-)(d{2})(-)(d{2})_(d{2})-(d{2})-(d{2}h)(.)(mp4)


But I've tried a lot of sed command, like :



sed -n -E 's/([A-z])*(-)(d{4})(-)(d{2})(-)(d{2})_(d{2})-(d{2})-(d{2}h)(.)(mp4)/2/p' /media/partage/v2/backupGdriveListOnline.txt


And nothing seems to work.



Is it the right command output online the filenames ?







regex bash






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Jan 2 at 16:04









petairepetaire

307219




307219








  • 3





    If the filenames don't have whitespaces in them, you could do awk '{print $2}' file.

    – mickp
    Jan 2 at 16:09













  • $2 for 2=´([A-z])*(-)(d{4})(-)(d{2})(-)(d{2})_(d{2})-(d{2})-(d{2}h)(.)(mp4)´ ?

    – petaire
    Jan 2 at 16:22











  • (Oh ok, just $2)

    – petaire
    Jan 2 at 16:24











  • @petaire $2 means second column.

    – Tiw
    Jan 2 at 16:24














  • 3





    If the filenames don't have whitespaces in them, you could do awk '{print $2}' file.

    – mickp
    Jan 2 at 16:09













  • $2 for 2=´([A-z])*(-)(d{4})(-)(d{2})(-)(d{2})_(d{2})-(d{2})-(d{2}h)(.)(mp4)´ ?

    – petaire
    Jan 2 at 16:22











  • (Oh ok, just $2)

    – petaire
    Jan 2 at 16:24











  • @petaire $2 means second column.

    – Tiw
    Jan 2 at 16:24








3




3





If the filenames don't have whitespaces in them, you could do awk '{print $2}' file.

– mickp
Jan 2 at 16:09







If the filenames don't have whitespaces in them, you could do awk '{print $2}' file.

– mickp
Jan 2 at 16:09















$2 for 2=´([A-z])*(-)(d{4})(-)(d{2})(-)(d{2})_(d{2})-(d{2})-(d{2}h)(.)(mp4)´ ?

– petaire
Jan 2 at 16:22





$2 for 2=´([A-z])*(-)(d{4})(-)(d{2})(-)(d{2})_(d{2})-(d{2})-(d{2}h)(.)(mp4)´ ?

– petaire
Jan 2 at 16:22













(Oh ok, just $2)

– petaire
Jan 2 at 16:24





(Oh ok, just $2)

– petaire
Jan 2 at 16:24













@petaire $2 means second column.

– Tiw
Jan 2 at 16:24





@petaire $2 means second column.

– Tiw
Jan 2 at 16:24












1 Answer
1






active

oldest

votes


















1














sed does not support some regex functionalities.

Try grep:



grep -ioP '([A-Z])*(-)(d{4})(-)(d{2})(-)(d{2})_(d{2})-(d{2})-(d{2}h)(.)(mp4)' text


Output:



CNEWS-2018-12-01_07-00-00h.mp4
franceinfo-2018-12-01_06-30-00h.mp4
LCI-2018-12-01_06-00-00h.mp4
BFMTV-2018-12-01_05-30-00h.mp4
franceinfo-2018-12-01_04-30-00h.mp4
BFMTV-2018-12-01_03-30-00h.mp4


Also you have a typo in your regex, [A-z] should be [A-Z].
-i, --ignore-case ignore case distinctions
-o, --only-matching show only the part of a line matching PATTERN
-P, --perl-regexp PATTERN is a Perl regular expression



I can see you put big efforts to your regex, so I suggested this one.



However, apart from awk's clean print $2 way, you can use sed to really clean other things too:



sed -E 's/^[^ t]*[ t]+//;s/(.mp4).*/1/' text


It's to remove everything from line beginning to spaces(include),

and remove everything after .mp4






share|improve this answer


























  • Is it oversimplification to just use sed -E 's/^[^ t]+[ t]+([^ t]+).*/1/' ?

    – Paul Hodges
    Jan 2 at 19:37













  • Nice ! Thank you everyone !

    – petaire
    Jan 2 at 20:31











  • @PaulHodges Might be, if the Filenames contain space then it will fail that way. Otherwise it's okay.

    – Tiw
    Jan 3 at 3:11








  • 1





    True, though I was under the impression spaces were being used as a file delimiter, which would be a bad plan if they have embedded spaces. :)

    – Paul Hodges
    Jan 3 at 14:13











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54009469%2fregex-and-sed-remove-everything-in-a-text-file-but-filenames%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














sed does not support some regex functionalities.

Try grep:



grep -ioP '([A-Z])*(-)(d{4})(-)(d{2})(-)(d{2})_(d{2})-(d{2})-(d{2}h)(.)(mp4)' text


Output:



CNEWS-2018-12-01_07-00-00h.mp4
franceinfo-2018-12-01_06-30-00h.mp4
LCI-2018-12-01_06-00-00h.mp4
BFMTV-2018-12-01_05-30-00h.mp4
franceinfo-2018-12-01_04-30-00h.mp4
BFMTV-2018-12-01_03-30-00h.mp4


Also you have a typo in your regex, [A-z] should be [A-Z].
-i, --ignore-case ignore case distinctions
-o, --only-matching show only the part of a line matching PATTERN
-P, --perl-regexp PATTERN is a Perl regular expression



I can see you put big efforts to your regex, so I suggested this one.



However, apart from awk's clean print $2 way, you can use sed to really clean other things too:



sed -E 's/^[^ t]*[ t]+//;s/(.mp4).*/1/' text


It's to remove everything from line beginning to spaces(include),

and remove everything after .mp4






share|improve this answer


























  • Is it oversimplification to just use sed -E 's/^[^ t]+[ t]+([^ t]+).*/1/' ?

    – Paul Hodges
    Jan 2 at 19:37













  • Nice ! Thank you everyone !

    – petaire
    Jan 2 at 20:31











  • @PaulHodges Might be, if the Filenames contain space then it will fail that way. Otherwise it's okay.

    – Tiw
    Jan 3 at 3:11








  • 1





    True, though I was under the impression spaces were being used as a file delimiter, which would be a bad plan if they have embedded spaces. :)

    – Paul Hodges
    Jan 3 at 14:13
















1














sed does not support some regex functionalities.

Try grep:



grep -ioP '([A-Z])*(-)(d{4})(-)(d{2})(-)(d{2})_(d{2})-(d{2})-(d{2}h)(.)(mp4)' text


Output:



CNEWS-2018-12-01_07-00-00h.mp4
franceinfo-2018-12-01_06-30-00h.mp4
LCI-2018-12-01_06-00-00h.mp4
BFMTV-2018-12-01_05-30-00h.mp4
franceinfo-2018-12-01_04-30-00h.mp4
BFMTV-2018-12-01_03-30-00h.mp4


Also you have a typo in your regex, [A-z] should be [A-Z].
-i, --ignore-case ignore case distinctions
-o, --only-matching show only the part of a line matching PATTERN
-P, --perl-regexp PATTERN is a Perl regular expression



I can see you put big efforts to your regex, so I suggested this one.



However, apart from awk's clean print $2 way, you can use sed to really clean other things too:



sed -E 's/^[^ t]*[ t]+//;s/(.mp4).*/1/' text


It's to remove everything from line beginning to spaces(include),

and remove everything after .mp4






share|improve this answer


























  • Is it oversimplification to just use sed -E 's/^[^ t]+[ t]+([^ t]+).*/1/' ?

    – Paul Hodges
    Jan 2 at 19:37













  • Nice ! Thank you everyone !

    – petaire
    Jan 2 at 20:31











  • @PaulHodges Might be, if the Filenames contain space then it will fail that way. Otherwise it's okay.

    – Tiw
    Jan 3 at 3:11








  • 1





    True, though I was under the impression spaces were being used as a file delimiter, which would be a bad plan if they have embedded spaces. :)

    – Paul Hodges
    Jan 3 at 14:13














1












1








1







sed does not support some regex functionalities.

Try grep:



grep -ioP '([A-Z])*(-)(d{4})(-)(d{2})(-)(d{2})_(d{2})-(d{2})-(d{2}h)(.)(mp4)' text


Output:



CNEWS-2018-12-01_07-00-00h.mp4
franceinfo-2018-12-01_06-30-00h.mp4
LCI-2018-12-01_06-00-00h.mp4
BFMTV-2018-12-01_05-30-00h.mp4
franceinfo-2018-12-01_04-30-00h.mp4
BFMTV-2018-12-01_03-30-00h.mp4


Also you have a typo in your regex, [A-z] should be [A-Z].
-i, --ignore-case ignore case distinctions
-o, --only-matching show only the part of a line matching PATTERN
-P, --perl-regexp PATTERN is a Perl regular expression



I can see you put big efforts to your regex, so I suggested this one.



However, apart from awk's clean print $2 way, you can use sed to really clean other things too:



sed -E 's/^[^ t]*[ t]+//;s/(.mp4).*/1/' text


It's to remove everything from line beginning to spaces(include),

and remove everything after .mp4






share|improve this answer















sed does not support some regex functionalities.

Try grep:



grep -ioP '([A-Z])*(-)(d{4})(-)(d{2})(-)(d{2})_(d{2})-(d{2})-(d{2}h)(.)(mp4)' text


Output:



CNEWS-2018-12-01_07-00-00h.mp4
franceinfo-2018-12-01_06-30-00h.mp4
LCI-2018-12-01_06-00-00h.mp4
BFMTV-2018-12-01_05-30-00h.mp4
franceinfo-2018-12-01_04-30-00h.mp4
BFMTV-2018-12-01_03-30-00h.mp4


Also you have a typo in your regex, [A-z] should be [A-Z].
-i, --ignore-case ignore case distinctions
-o, --only-matching show only the part of a line matching PATTERN
-P, --perl-regexp PATTERN is a Perl regular expression



I can see you put big efforts to your regex, so I suggested this one.



However, apart from awk's clean print $2 way, you can use sed to really clean other things too:



sed -E 's/^[^ t]*[ t]+//;s/(.mp4).*/1/' text


It's to remove everything from line beginning to spaces(include),

and remove everything after .mp4







share|improve this answer














share|improve this answer



share|improve this answer








edited Jan 2 at 16:29

























answered Jan 2 at 16:21









TiwTiw

4,28361630




4,28361630













  • Is it oversimplification to just use sed -E 's/^[^ t]+[ t]+([^ t]+).*/1/' ?

    – Paul Hodges
    Jan 2 at 19:37













  • Nice ! Thank you everyone !

    – petaire
    Jan 2 at 20:31











  • @PaulHodges Might be, if the Filenames contain space then it will fail that way. Otherwise it's okay.

    – Tiw
    Jan 3 at 3:11








  • 1





    True, though I was under the impression spaces were being used as a file delimiter, which would be a bad plan if they have embedded spaces. :)

    – Paul Hodges
    Jan 3 at 14:13



















  • Is it oversimplification to just use sed -E 's/^[^ t]+[ t]+([^ t]+).*/1/' ?

    – Paul Hodges
    Jan 2 at 19:37













  • Nice ! Thank you everyone !

    – petaire
    Jan 2 at 20:31











  • @PaulHodges Might be, if the Filenames contain space then it will fail that way. Otherwise it's okay.

    – Tiw
    Jan 3 at 3:11








  • 1





    True, though I was under the impression spaces were being used as a file delimiter, which would be a bad plan if they have embedded spaces. :)

    – Paul Hodges
    Jan 3 at 14:13

















Is it oversimplification to just use sed -E 's/^[^ t]+[ t]+([^ t]+).*/1/' ?

– Paul Hodges
Jan 2 at 19:37







Is it oversimplification to just use sed -E 's/^[^ t]+[ t]+([^ t]+).*/1/' ?

– Paul Hodges
Jan 2 at 19:37















Nice ! Thank you everyone !

– petaire
Jan 2 at 20:31





Nice ! Thank you everyone !

– petaire
Jan 2 at 20:31













@PaulHodges Might be, if the Filenames contain space then it will fail that way. Otherwise it's okay.

– Tiw
Jan 3 at 3:11







@PaulHodges Might be, if the Filenames contain space then it will fail that way. Otherwise it's okay.

– Tiw
Jan 3 at 3:11






1




1





True, though I was under the impression spaces were being used as a file delimiter, which would be a bad plan if they have embedded spaces. :)

– Paul Hodges
Jan 3 at 14:13





True, though I was under the impression spaces were being used as a file delimiter, which would be a bad plan if they have embedded spaces. :)

– Paul Hodges
Jan 3 at 14:13




















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54009469%2fregex-and-sed-remove-everything-in-a-text-file-but-filenames%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

MongoDB - Not Authorized To Execute Command

in spring boot 2.1 many test slices are not allowed anymore due to multiple @BootstrapWith

How to fix TextFormField cause rebuild widget in Flutter