Regex and sed - remove everything in a text file but filenames

I'm trying to clean a list files text. Here's a sample :

15Tlb3Bsn5ec71Os6paEyTpf-5YkTsjwo   CNEWS-2018-12-01_07-00-00h.mp4             bin    1.5 GB     2018-12-03 16:03:00

1irhwA-tcExWXs-ksyOQuEBYL-LDktMQB   franceinfo-2018-12-01_06-30-00h.mp4        bin    949.2 MB   2018-12-03 18:43:10

1UEjtEtU27gMA-Bf7J1rTVhFn9D5z0Rjb   LCI-2018-12-01_06-00-00h.mp4               bin    908.2 MB   2018-12-03 17:30:11

1_ouEY6Ugg8h_XvzjE4j4m751o3eMNxhh   BFMTV-2018-12-01_05-30-00h.mp4             bin    1.2 GB     2018-12-03 14:33:25

1f7JWvb6PM9PRhFimXKc8k81qiTVKwe-e   franceinfo-2018-12-01_04-30-00h.mp4        bin    1.0 GB     2018-12-03 18:43:36

1nKzPZw6tKNzErmWdwbq8f-47DSF4cQbt   BFMTV-2018-12-01_03-30-00h.mp4             bin    1.2 GB     2018-12-03 14:33:03

So I think that this expression might work :

([A-z])*(-)(d{4})(-)(d{2})(-)(d{2})_(d{2})-(d{2})-(d{2}h)(.)(mp4)

But I've tried a lot of sed command, like :

sed -n -E 's/([A-z])*(-)(d{4})(-)(d{2})(-)(d{2})_(d{2})-(d{2})-(d{2}h)(.)(mp4)/2/p' /media/partage/v2/backupGdriveListOnline.txt

And nothing seems to work.

Is it the right command output online the filenames ?

asked Jan 2 at 16:04

petaire

307219

3

If the filenames don't have whitespaces in them, you could do awk '{print $2}' file.

– mickp
Jan 2 at 16:09

$2 for 2=´([A-z])*(-)(d{4})(-)(d{2})(-)(d{2})_(d{2})-(d{2})-(d{2}h)(.)(mp4)´ ?

– petaire
Jan 2 at 16:22

(Oh ok, just $2)

– petaire
Jan 2 at 16:24

@petaire $2 means second column.

– Tiw
Jan 2 at 16:24

add a comment |

I'm trying to clean a list files text. Here's a sample :

15Tlb3Bsn5ec71Os6paEyTpf-5YkTsjwo   CNEWS-2018-12-01_07-00-00h.mp4             bin    1.5 GB     2018-12-03 16:03:00

1irhwA-tcExWXs-ksyOQuEBYL-LDktMQB   franceinfo-2018-12-01_06-30-00h.mp4        bin    949.2 MB   2018-12-03 18:43:10

1UEjtEtU27gMA-Bf7J1rTVhFn9D5z0Rjb   LCI-2018-12-01_06-00-00h.mp4               bin    908.2 MB   2018-12-03 17:30:11

1_ouEY6Ugg8h_XvzjE4j4m751o3eMNxhh   BFMTV-2018-12-01_05-30-00h.mp4             bin    1.2 GB     2018-12-03 14:33:25

1f7JWvb6PM9PRhFimXKc8k81qiTVKwe-e   franceinfo-2018-12-01_04-30-00h.mp4        bin    1.0 GB     2018-12-03 18:43:36

1nKzPZw6tKNzErmWdwbq8f-47DSF4cQbt   BFMTV-2018-12-01_03-30-00h.mp4             bin    1.2 GB     2018-12-03 14:33:03

So I think that this expression might work :

([A-z])*(-)(d{4})(-)(d{2})(-)(d{2})_(d{2})-(d{2})-(d{2}h)(.)(mp4)

But I've tried a lot of sed command, like :

sed -n -E 's/([A-z])*(-)(d{4})(-)(d{2})(-)(d{2})_(d{2})-(d{2})-(d{2}h)(.)(mp4)/2/p' /media/partage/v2/backupGdriveListOnline.txt

And nothing seems to work.

Is it the right command output online the filenames ?

asked Jan 2 at 16:04

petaire

307219

3

If the filenames don't have whitespaces in them, you could do awk '{print $2}' file.

– mickp
Jan 2 at 16:09

$2 for 2=´([A-z])*(-)(d{4})(-)(d{2})(-)(d{2})_(d{2})-(d{2})-(d{2}h)(.)(mp4)´ ?

– petaire
Jan 2 at 16:22

(Oh ok, just $2)

– petaire
Jan 2 at 16:24

@petaire $2 means second column.

– Tiw
Jan 2 at 16:24

add a comment |

I'm trying to clean a list files text. Here's a sample :

15Tlb3Bsn5ec71Os6paEyTpf-5YkTsjwo   CNEWS-2018-12-01_07-00-00h.mp4             bin    1.5 GB     2018-12-03 16:03:00

1irhwA-tcExWXs-ksyOQuEBYL-LDktMQB   franceinfo-2018-12-01_06-30-00h.mp4        bin    949.2 MB   2018-12-03 18:43:10

1UEjtEtU27gMA-Bf7J1rTVhFn9D5z0Rjb   LCI-2018-12-01_06-00-00h.mp4               bin    908.2 MB   2018-12-03 17:30:11

1_ouEY6Ugg8h_XvzjE4j4m751o3eMNxhh   BFMTV-2018-12-01_05-30-00h.mp4             bin    1.2 GB     2018-12-03 14:33:25

1f7JWvb6PM9PRhFimXKc8k81qiTVKwe-e   franceinfo-2018-12-01_04-30-00h.mp4        bin    1.0 GB     2018-12-03 18:43:36

1nKzPZw6tKNzErmWdwbq8f-47DSF4cQbt   BFMTV-2018-12-01_03-30-00h.mp4             bin    1.2 GB     2018-12-03 14:33:03

So I think that this expression might work :

([A-z])*(-)(d{4})(-)(d{2})(-)(d{2})_(d{2})-(d{2})-(d{2}h)(.)(mp4)

But I've tried a lot of sed command, like :

sed -n -E 's/([A-z])*(-)(d{4})(-)(d{2})(-)(d{2})_(d{2})-(d{2})-(d{2}h)(.)(mp4)/2/p' /media/partage/v2/backupGdriveListOnline.txt

And nothing seems to work.

Is it the right command output online the filenames ?

asked Jan 2 at 16:04

petaire

307219

I'm trying to clean a list files text. Here's a sample :

15Tlb3Bsn5ec71Os6paEyTpf-5YkTsjwo   CNEWS-2018-12-01_07-00-00h.mp4             bin    1.5 GB     2018-12-03 16:03:00

1irhwA-tcExWXs-ksyOQuEBYL-LDktMQB   franceinfo-2018-12-01_06-30-00h.mp4        bin    949.2 MB   2018-12-03 18:43:10

1UEjtEtU27gMA-Bf7J1rTVhFn9D5z0Rjb   LCI-2018-12-01_06-00-00h.mp4               bin    908.2 MB   2018-12-03 17:30:11

1_ouEY6Ugg8h_XvzjE4j4m751o3eMNxhh   BFMTV-2018-12-01_05-30-00h.mp4             bin    1.2 GB     2018-12-03 14:33:25

1f7JWvb6PM9PRhFimXKc8k81qiTVKwe-e   franceinfo-2018-12-01_04-30-00h.mp4        bin    1.0 GB     2018-12-03 18:43:36

1nKzPZw6tKNzErmWdwbq8f-47DSF4cQbt   BFMTV-2018-12-01_03-30-00h.mp4             bin    1.2 GB     2018-12-03 14:33:03

So I think that this expression might work :

([A-z])*(-)(d{4})(-)(d{2})(-)(d{2})_(d{2})-(d{2})-(d{2}h)(.)(mp4)

But I've tried a lot of sed command, like :

sed -n -E 's/([A-z])*(-)(d{4})(-)(d{2})(-)(d{2})_(d{2})-(d{2})-(d{2}h)(.)(mp4)/2/p' /media/partage/v2/backupGdriveListOnline.txt

And nothing seems to work.

Is it the right command output online the filenames ?

regex bash

asked Jan 2 at 16:04

petaire

307219

asked Jan 2 at 16:04

petaire

307219

asked Jan 2 at 16:04

petaire

307219

asked Jan 2 at 16:04

petaire

307219

asked Jan 2 at 16:04

petaire

307219

3

If the filenames don't have whitespaces in them, you could do awk '{print $2}' file.

– mickp
Jan 2 at 16:09

$2 for 2=´([A-z])*(-)(d{4})(-)(d{2})(-)(d{2})_(d{2})-(d{2})-(d{2}h)(.)(mp4)´ ?

– petaire
Jan 2 at 16:22

(Oh ok, just $2)

– petaire
Jan 2 at 16:24

@petaire $2 means second column.

– Tiw
Jan 2 at 16:24

add a comment |

3

If the filenames don't have whitespaces in them, you could do awk '{print $2}' file.

– mickp
Jan 2 at 16:09

$2 for 2=´([A-z])*(-)(d{4})(-)(d{2})(-)(d{2})_(d{2})-(d{2})-(d{2}h)(.)(mp4)´ ?

– petaire
Jan 2 at 16:22

(Oh ok, just $2)

– petaire
Jan 2 at 16:24

@petaire $2 means second column.

– Tiw
Jan 2 at 16:24

If the filenames don't have whitespaces in them, you could do awk '{print $2}' file.

– mickp
Jan 2 at 16:09

$2 for 2=´([A-z])*(-)(d{4})(-)(d{2})(-)(d{2})_(d{2})-(d{2})-(d{2}h)(.)(mp4)´ ?

– petaire
Jan 2 at 16:22

(Oh ok, just $2)

– petaire
Jan 2 at 16:24

@petaire $2 means second column.

– Tiw
Jan 2 at 16:24

add a comment |

1 Answer
1

active

oldest

votes

sed does not support some regex functionalities.

Try grep:

grep -ioP '([A-Z])*(-)(d{4})(-)(d{2})(-)(d{2})_(d{2})-(d{2})-(d{2}h)(.)(mp4)' text

Output:

CNEWS-2018-12-01_07-00-00h.mp4

franceinfo-2018-12-01_06-30-00h.mp4

LCI-2018-12-01_06-00-00h.mp4

BFMTV-2018-12-01_05-30-00h.mp4

franceinfo-2018-12-01_04-30-00h.mp4

BFMTV-2018-12-01_03-30-00h.mp4

Also you have a typo in your regex, [A-z] should be [A-Z].
-i, --ignore-case ignore case distinctions
-o, --only-matching show only the part of a line matching PATTERN
-P, --perl-regexp PATTERN is a Perl regular expression

I can see you put big efforts to your regex, so I suggested this one.

However, apart from awk's clean print $2 way, you can use sed to really clean other things too:

sed -E 's/^[^ t]*[ t]+//;s/(.mp4).*/1/' text

It's to remove everything from line beginning to spaces(include),

and remove everything after .mp4

edited Jan 2 at 16:29

answered Jan 2 at 16:21

Tiw

4,28361630

Is it oversimplification to just use sed -E 's/^[^ t]+[ t]+([^ t]+).*/1/' ?

– Paul Hodges
Jan 2 at 19:37

Nice ! Thank you everyone !

– petaire
Jan 2 at 20:31

@PaulHodges Might be, if the Filenames contain space then it will fail that way. Otherwise it's okay.

– Tiw
Jan 3 at 3:11

1

True, though I was under the impression spaces were being used as a file delimiter, which would be a bad plan if they have embedded spaces. :)

– Paul Hodges
Jan 3 at 14:13

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54009469%2fregex-and-sed-remove-everything-in-a-text-file-but-filenames%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

sed does not support some regex functionalities.

Try grep:

grep -ioP '([A-Z])*(-)(d{4})(-)(d{2})(-)(d{2})_(d{2})-(d{2})-(d{2}h)(.)(mp4)' text

Output:

CNEWS-2018-12-01_07-00-00h.mp4

franceinfo-2018-12-01_06-30-00h.mp4

LCI-2018-12-01_06-00-00h.mp4

BFMTV-2018-12-01_05-30-00h.mp4

franceinfo-2018-12-01_04-30-00h.mp4

BFMTV-2018-12-01_03-30-00h.mp4

I can see you put big efforts to your regex, so I suggested this one.

However, apart from awk's clean print $2 way, you can use sed to really clean other things too:

sed -E 's/^[^ t]*[ t]+//;s/(.mp4).*/1/' text

It's to remove everything from line beginning to spaces(include),

and remove everything after .mp4

edited Jan 2 at 16:29

answered Jan 2 at 16:21

Tiw

4,28361630

Is it oversimplification to just use sed -E 's/^[^ t]+[ t]+([^ t]+).*/1/' ?

– Paul Hodges
Jan 2 at 19:37

Nice ! Thank you everyone !

– petaire
Jan 2 at 20:31

@PaulHodges Might be, if the Filenames contain space then it will fail that way. Otherwise it's okay.

– Tiw
Jan 3 at 3:11

1

True, though I was under the impression spaces were being used as a file delimiter, which would be a bad plan if they have embedded spaces. :)

– Paul Hodges
Jan 3 at 14:13

add a comment |

sed does not support some regex functionalities.

Try grep:

grep -ioP '([A-Z])*(-)(d{4})(-)(d{2})(-)(d{2})_(d{2})-(d{2})-(d{2}h)(.)(mp4)' text

Output:

CNEWS-2018-12-01_07-00-00h.mp4

franceinfo-2018-12-01_06-30-00h.mp4

LCI-2018-12-01_06-00-00h.mp4

BFMTV-2018-12-01_05-30-00h.mp4

franceinfo-2018-12-01_04-30-00h.mp4

BFMTV-2018-12-01_03-30-00h.mp4

I can see you put big efforts to your regex, so I suggested this one.

However, apart from awk's clean print $2 way, you can use sed to really clean other things too:

sed -E 's/^[^ t]*[ t]+//;s/(.mp4).*/1/' text

It's to remove everything from line beginning to spaces(include),

and remove everything after .mp4

edited Jan 2 at 16:29

answered Jan 2 at 16:21

Tiw

4,28361630

Is it oversimplification to just use sed -E 's/^[^ t]+[ t]+([^ t]+).*/1/' ?

– Paul Hodges
Jan 2 at 19:37

Nice ! Thank you everyone !

– petaire
Jan 2 at 20:31

@PaulHodges Might be, if the Filenames contain space then it will fail that way. Otherwise it's okay.

– Tiw
Jan 3 at 3:11

1

True, though I was under the impression spaces were being used as a file delimiter, which would be a bad plan if they have embedded spaces. :)

– Paul Hodges
Jan 3 at 14:13

add a comment |

sed does not support some regex functionalities.

Try grep:

grep -ioP '([A-Z])*(-)(d{4})(-)(d{2})(-)(d{2})_(d{2})-(d{2})-(d{2}h)(.)(mp4)' text

Output:

CNEWS-2018-12-01_07-00-00h.mp4

franceinfo-2018-12-01_06-30-00h.mp4

LCI-2018-12-01_06-00-00h.mp4

BFMTV-2018-12-01_05-30-00h.mp4

franceinfo-2018-12-01_04-30-00h.mp4

BFMTV-2018-12-01_03-30-00h.mp4

I can see you put big efforts to your regex, so I suggested this one.

However, apart from awk's clean print $2 way, you can use sed to really clean other things too:

sed -E 's/^[^ t]*[ t]+//;s/(.mp4).*/1/' text

It's to remove everything from line beginning to spaces(include),

and remove everything after .mp4

edited Jan 2 at 16:29

answered Jan 2 at 16:21

Tiw

4,28361630

sed does not support some regex functionalities.

Try grep:

grep -ioP '([A-Z])*(-)(d{4})(-)(d{2})(-)(d{2})_(d{2})-(d{2})-(d{2}h)(.)(mp4)' text

Output:

CNEWS-2018-12-01_07-00-00h.mp4

franceinfo-2018-12-01_06-30-00h.mp4

LCI-2018-12-01_06-00-00h.mp4

BFMTV-2018-12-01_05-30-00h.mp4

franceinfo-2018-12-01_04-30-00h.mp4

BFMTV-2018-12-01_03-30-00h.mp4

I can see you put big efforts to your regex, so I suggested this one.

However, apart from awk's clean print $2 way, you can use sed to really clean other things too:

sed -E 's/^[^ t]*[ t]+//;s/(.mp4).*/1/' text

It's to remove everything from line beginning to spaces(include),

and remove everything after .mp4

edited Jan 2 at 16:29

answered Jan 2 at 16:21

Tiw

4,28361630

edited Jan 2 at 16:29

answered Jan 2 at 16:21

Tiw

4,28361630

answered Jan 2 at 16:21

Tiw

4,28361630

answered Jan 2 at 16:21

Tiw

4,28361630

Is it oversimplification to just use sed -E 's/^[^ t]+[ t]+([^ t]+).*/1/' ?

– Paul Hodges
Jan 2 at 19:37

Nice ! Thank you everyone !

– petaire
Jan 2 at 20:31

@PaulHodges Might be, if the Filenames contain space then it will fail that way. Otherwise it's okay.

– Tiw
Jan 3 at 3:11

1

True, though I was under the impression spaces were being used as a file delimiter, which would be a bad plan if they have embedded spaces. :)

– Paul Hodges
Jan 3 at 14:13

add a comment |

Is it oversimplification to just use sed -E 's/^[^ t]+[ t]+([^ t]+).*/1/' ?

– Paul Hodges
Jan 2 at 19:37

Nice ! Thank you everyone !

– petaire
Jan 2 at 20:31

@PaulHodges Might be, if the Filenames contain space then it will fail that way. Otherwise it's okay.

– Tiw
Jan 3 at 3:11

1

True, though I was under the impression spaces were being used as a file delimiter, which would be a bad plan if they have embedded spaces. :)

– Paul Hodges
Jan 3 at 14:13

Is it oversimplification to just use sed -E 's/^[^ t]+[ t]+([^ t]+).*/1/' ?

– Paul Hodges
Jan 2 at 19:37

Nice ! Thank you everyone !

– petaire
Jan 2 at 20:31

@PaulHodges Might be, if the Filenames contain space then it will fail that way. Otherwise it's okay.

– Tiw
Jan 3 at 3:11

True, though I was under the impression spaces were being used as a file delimiter, which would be a bad plan if they have embedded spaces. :)

– Paul Hodges
Jan 3 at 14:13

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

Search This Blog

Ufyukyu