Extract information between square brackets for line in text file starting with >
I have many text files which look like this:
>CAA97360; SPAC26F1.03 [SCHPO]
M-----FRTCTKIGTVPKVLVNQKGLIDGLRRVTTDATTSRANPAHVPEEHDKPFPVKLD
DSVFEGYKIDVPSTEIEVTKGELLGLYEKMVTIRRLELACDALYKAKKIRGFCHLSIGQE
I want to extract the information between the square brackets and retain the M----FRT etc sequence below. So I want the text to look like this:
>SCHPO
M-----FRTCTKIGTVPKVLVNQKGLIDGLRRVTTDATTSRANPAHVPEEHDKPFPVKLD
DSVFEGYKIDVPSTEIEVTKGELLGLYEKMVTIRRLELACDALYKAKKIRGFCHLSIGQE
How could I do this using the terminal please?
bash command-line terminal
add a comment |
I have many text files which look like this:
>CAA97360; SPAC26F1.03 [SCHPO]
M-----FRTCTKIGTVPKVLVNQKGLIDGLRRVTTDATTSRANPAHVPEEHDKPFPVKLD
DSVFEGYKIDVPSTEIEVTKGELLGLYEKMVTIRRLELACDALYKAKKIRGFCHLSIGQE
I want to extract the information between the square brackets and retain the M----FRT etc sequence below. So I want the text to look like this:
>SCHPO
M-----FRTCTKIGTVPKVLVNQKGLIDGLRRVTTDATTSRANPAHVPEEHDKPFPVKLD
DSVFEGYKIDVPSTEIEVTKGELLGLYEKMVTIRRLELACDALYKAKKIRGFCHLSIGQE
How could I do this using the terminal please?
bash command-line terminal
If it has to be bash scripting, I suggest you look into using AWK. There's a good tutorial here: cyberciti.biz/faq/bash-scripting-using-awk
– ahota
Nov 20 '18 at 18:00
add a comment |
I have many text files which look like this:
>CAA97360; SPAC26F1.03 [SCHPO]
M-----FRTCTKIGTVPKVLVNQKGLIDGLRRVTTDATTSRANPAHVPEEHDKPFPVKLD
DSVFEGYKIDVPSTEIEVTKGELLGLYEKMVTIRRLELACDALYKAKKIRGFCHLSIGQE
I want to extract the information between the square brackets and retain the M----FRT etc sequence below. So I want the text to look like this:
>SCHPO
M-----FRTCTKIGTVPKVLVNQKGLIDGLRRVTTDATTSRANPAHVPEEHDKPFPVKLD
DSVFEGYKIDVPSTEIEVTKGELLGLYEKMVTIRRLELACDALYKAKKIRGFCHLSIGQE
How could I do this using the terminal please?
bash command-line terminal
I have many text files which look like this:
>CAA97360; SPAC26F1.03 [SCHPO]
M-----FRTCTKIGTVPKVLVNQKGLIDGLRRVTTDATTSRANPAHVPEEHDKPFPVKLD
DSVFEGYKIDVPSTEIEVTKGELLGLYEKMVTIRRLELACDALYKAKKIRGFCHLSIGQE
I want to extract the information between the square brackets and retain the M----FRT etc sequence below. So I want the text to look like this:
>SCHPO
M-----FRTCTKIGTVPKVLVNQKGLIDGLRRVTTDATTSRANPAHVPEEHDKPFPVKLD
DSVFEGYKIDVPSTEIEVTKGELLGLYEKMVTIRRLELACDALYKAKKIRGFCHLSIGQE
How could I do this using the terminal please?
bash command-line terminal
bash command-line terminal
asked Nov 20 '18 at 17:50
Katie_C94Katie_C94
32
32
If it has to be bash scripting, I suggest you look into using AWK. There's a good tutorial here: cyberciti.biz/faq/bash-scripting-using-awk
– ahota
Nov 20 '18 at 18:00
add a comment |
If it has to be bash scripting, I suggest you look into using AWK. There's a good tutorial here: cyberciti.biz/faq/bash-scripting-using-awk
– ahota
Nov 20 '18 at 18:00
If it has to be bash scripting, I suggest you look into using AWK. There's a good tutorial here: cyberciti.biz/faq/bash-scripting-using-awk
– ahota
Nov 20 '18 at 18:00
If it has to be bash scripting, I suggest you look into using AWK. There's a good tutorial here: cyberciti.biz/faq/bash-scripting-using-awk
– ahota
Nov 20 '18 at 18:00
add a comment |
1 Answer
1
active
oldest
votes
With AWK, try something like:
gawk '{print gensub(/^>.*[(.+)]/, ">\1", 1)}' text
which outputs:
>SCHPO
M-----FRTCTKIGTVPKVLVNQKGLIDGLRRVTTDATTSRANPAHVPEEHDKPFPVKLD
DSVFEGYKIDVPSTEIEVTKGELLGLYEKMVTIRRLELACDALYKAKKIRGFCHLSIGQE
Explanations:
- The
awk
functiongensub()
searches the string (defaulted to$0
, the current line) for the regular expression (1st argument) and replaces the matched string with the 2nd argument. (Note that this is a very rough overview ofgensub()
function. See the man page for detailed explanations.) - The regular expression
/^>.*[(.+)]/
matches with a line which starts with '>', followed by some characters, and a substring surrounded by square brackets.
Pay attention to the parens around the the pattern within the square brackets. - As for the 2nd argument,
\1
(the leftmost backslash just escapes the next one) indicates the 1st parenthesized expression in the regular expression above.
It is called aback reference
and you can reuse the matched substring (the information between the square brackets in this case) with this mechanism. - If the pattern matches,
gensub()
returns the modified string. Otherwise it returns the original string. So just sayingprint gensub() ...
works for both matched and unmatched lines.
Hope this helps.
This worked perfectly, thank you so much. The explanation was also really helpful
– Katie_C94
Nov 26 '18 at 14:16
Good to know I could be helpful to you. I would be appreciated if you can take an action following the instruction. BR.
– tshiono
Nov 27 '18 at 1:39
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53398730%2fextract-information-between-square-brackets-for-line-in-text-file-starting-with%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
With AWK, try something like:
gawk '{print gensub(/^>.*[(.+)]/, ">\1", 1)}' text
which outputs:
>SCHPO
M-----FRTCTKIGTVPKVLVNQKGLIDGLRRVTTDATTSRANPAHVPEEHDKPFPVKLD
DSVFEGYKIDVPSTEIEVTKGELLGLYEKMVTIRRLELACDALYKAKKIRGFCHLSIGQE
Explanations:
- The
awk
functiongensub()
searches the string (defaulted to$0
, the current line) for the regular expression (1st argument) and replaces the matched string with the 2nd argument. (Note that this is a very rough overview ofgensub()
function. See the man page for detailed explanations.) - The regular expression
/^>.*[(.+)]/
matches with a line which starts with '>', followed by some characters, and a substring surrounded by square brackets.
Pay attention to the parens around the the pattern within the square brackets. - As for the 2nd argument,
\1
(the leftmost backslash just escapes the next one) indicates the 1st parenthesized expression in the regular expression above.
It is called aback reference
and you can reuse the matched substring (the information between the square brackets in this case) with this mechanism. - If the pattern matches,
gensub()
returns the modified string. Otherwise it returns the original string. So just sayingprint gensub() ...
works for both matched and unmatched lines.
Hope this helps.
This worked perfectly, thank you so much. The explanation was also really helpful
– Katie_C94
Nov 26 '18 at 14:16
Good to know I could be helpful to you. I would be appreciated if you can take an action following the instruction. BR.
– tshiono
Nov 27 '18 at 1:39
add a comment |
With AWK, try something like:
gawk '{print gensub(/^>.*[(.+)]/, ">\1", 1)}' text
which outputs:
>SCHPO
M-----FRTCTKIGTVPKVLVNQKGLIDGLRRVTTDATTSRANPAHVPEEHDKPFPVKLD
DSVFEGYKIDVPSTEIEVTKGELLGLYEKMVTIRRLELACDALYKAKKIRGFCHLSIGQE
Explanations:
- The
awk
functiongensub()
searches the string (defaulted to$0
, the current line) for the regular expression (1st argument) and replaces the matched string with the 2nd argument. (Note that this is a very rough overview ofgensub()
function. See the man page for detailed explanations.) - The regular expression
/^>.*[(.+)]/
matches with a line which starts with '>', followed by some characters, and a substring surrounded by square brackets.
Pay attention to the parens around the the pattern within the square brackets. - As for the 2nd argument,
\1
(the leftmost backslash just escapes the next one) indicates the 1st parenthesized expression in the regular expression above.
It is called aback reference
and you can reuse the matched substring (the information between the square brackets in this case) with this mechanism. - If the pattern matches,
gensub()
returns the modified string. Otherwise it returns the original string. So just sayingprint gensub() ...
works for both matched and unmatched lines.
Hope this helps.
This worked perfectly, thank you so much. The explanation was also really helpful
– Katie_C94
Nov 26 '18 at 14:16
Good to know I could be helpful to you. I would be appreciated if you can take an action following the instruction. BR.
– tshiono
Nov 27 '18 at 1:39
add a comment |
With AWK, try something like:
gawk '{print gensub(/^>.*[(.+)]/, ">\1", 1)}' text
which outputs:
>SCHPO
M-----FRTCTKIGTVPKVLVNQKGLIDGLRRVTTDATTSRANPAHVPEEHDKPFPVKLD
DSVFEGYKIDVPSTEIEVTKGELLGLYEKMVTIRRLELACDALYKAKKIRGFCHLSIGQE
Explanations:
- The
awk
functiongensub()
searches the string (defaulted to$0
, the current line) for the regular expression (1st argument) and replaces the matched string with the 2nd argument. (Note that this is a very rough overview ofgensub()
function. See the man page for detailed explanations.) - The regular expression
/^>.*[(.+)]/
matches with a line which starts with '>', followed by some characters, and a substring surrounded by square brackets.
Pay attention to the parens around the the pattern within the square brackets. - As for the 2nd argument,
\1
(the leftmost backslash just escapes the next one) indicates the 1st parenthesized expression in the regular expression above.
It is called aback reference
and you can reuse the matched substring (the information between the square brackets in this case) with this mechanism. - If the pattern matches,
gensub()
returns the modified string. Otherwise it returns the original string. So just sayingprint gensub() ...
works for both matched and unmatched lines.
Hope this helps.
With AWK, try something like:
gawk '{print gensub(/^>.*[(.+)]/, ">\1", 1)}' text
which outputs:
>SCHPO
M-----FRTCTKIGTVPKVLVNQKGLIDGLRRVTTDATTSRANPAHVPEEHDKPFPVKLD
DSVFEGYKIDVPSTEIEVTKGELLGLYEKMVTIRRLELACDALYKAKKIRGFCHLSIGQE
Explanations:
- The
awk
functiongensub()
searches the string (defaulted to$0
, the current line) for the regular expression (1st argument) and replaces the matched string with the 2nd argument. (Note that this is a very rough overview ofgensub()
function. See the man page for detailed explanations.) - The regular expression
/^>.*[(.+)]/
matches with a line which starts with '>', followed by some characters, and a substring surrounded by square brackets.
Pay attention to the parens around the the pattern within the square brackets. - As for the 2nd argument,
\1
(the leftmost backslash just escapes the next one) indicates the 1st parenthesized expression in the regular expression above.
It is called aback reference
and you can reuse the matched substring (the information between the square brackets in this case) with this mechanism. - If the pattern matches,
gensub()
returns the modified string. Otherwise it returns the original string. So just sayingprint gensub() ...
works for both matched and unmatched lines.
Hope this helps.
answered Nov 21 '18 at 0:06
tshionotshiono
2,074234
2,074234
This worked perfectly, thank you so much. The explanation was also really helpful
– Katie_C94
Nov 26 '18 at 14:16
Good to know I could be helpful to you. I would be appreciated if you can take an action following the instruction. BR.
– tshiono
Nov 27 '18 at 1:39
add a comment |
This worked perfectly, thank you so much. The explanation was also really helpful
– Katie_C94
Nov 26 '18 at 14:16
Good to know I could be helpful to you. I would be appreciated if you can take an action following the instruction. BR.
– tshiono
Nov 27 '18 at 1:39
This worked perfectly, thank you so much. The explanation was also really helpful
– Katie_C94
Nov 26 '18 at 14:16
This worked perfectly, thank you so much. The explanation was also really helpful
– Katie_C94
Nov 26 '18 at 14:16
Good to know I could be helpful to you. I would be appreciated if you can take an action following the instruction. BR.
– tshiono
Nov 27 '18 at 1:39
Good to know I could be helpful to you. I would be appreciated if you can take an action following the instruction. BR.
– tshiono
Nov 27 '18 at 1:39
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53398730%2fextract-information-between-square-brackets-for-line-in-text-file-starting-with%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
If it has to be bash scripting, I suggest you look into using AWK. There's a good tutorial here: cyberciti.biz/faq/bash-scripting-using-awk
– ahota
Nov 20 '18 at 18:00