Extract a specific string from text file and create HTTP request with the extract string

I'm trying to extract a specific string value from a text file (file1.txt)

then to create HTTP GET request with the extracted string (url address)

the HTTP response should be saved as a new HTML file in the
directory.

The string I'm trying to extract is a value of a specific key.
For example: "display_url":"test.com" (extract "test.com" and then to create http request)

The structure of file1.txt content could be multiple instances of display_url, since it is in a list under urls. if there is more then one value I want to make HTTP request for each one of them.

My txt file content:

{"created_at":"Thu Nov 15 11:35:00 +0000 2018","id":15292802,"id_str":325802","text":"test8 https://test/ZtCsuk7Ek2 #osining","source":"u003ca href="http://twitter.com" rel="nofollow"u003eTwitter Web Clientu003c/au003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":961508561217052675,"id_str":"961508561217052675","name":"Online S","screen_name":"osectraining","location":"Israel","url":"https://www.test.co.il","description":"test","translator_type":"none","protected":false,"verified":false,"followers_count":2,"friends_count":51,"listed_count":0,"favourites_count":0,"statuses_count":7,"created_at":"Thu Feb 08 07:54:39 +0000 2018","utc_offset":null,"time_zone":null,"geo_enabled":false,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"000000","profile_background_image_url":"http://abs.twimg.com/images/themes/theme1/bg.png","profile_background_image_url_https":"https://abs.twimg.com/images/themes/theme1/bg.png","profile_background_tile":false,"profile_link_color":"1B95E0","profile_sidebar_border_color":"000000","profile_sidebar_fill_color":"000000","profile_text_color":"000000","profile_use_background_image":false,"profile_image_url":"http://pbs.twimg.com/profile_images/961510231346958336/d_KhBeTD_normal.jpg","profile_image_url_https":"https://pbs.twimg.com/profile_images/961510231346958336/d_KhBeTD_normal.jpg","profile_banner_url":"https://pbs.twimg.com/profile_banners/961508561217052675/1518076913","default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"is_quote_status":false,"quote_count":0,"reply_count":0,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[{"text":"osectraining","indices":[33,46]}],"urls":[{"url":"https://test/ZtCsuk7Ek2","expanded_url":"http://test.com","display_url":"test.com","indices":[7,30]}],"user_mentions":,"symbols":},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"en","timestamp_ms":"1542281700508"}

edited Nov 21 '18 at 9:41

asked Nov 21 '18 at 9:02

bugnet17

667

The structure of your file content implies that there could be multiple instances of display_url, since it is in a list under urls. What should happen if multiple are found?

– lxop
Nov 21 '18 at 9:17

2

You are asking multiple questions at once. It would be wise to split up your question.

– Micha Wiedenmann
Nov 21 '18 at 9:19

This talks about parsing JSON with BASH: stackoverflow.com/questions/1955505/…

– Micha Wiedenmann
Nov 21 '18 at 9:19

Also, your txt file has a bonus " in it at the end of the id_str value

– lxop
Nov 21 '18 at 9:24

Edited my question.

– bugnet17
Nov 21 '18 at 9:41

|
show 2 more comments

I'm trying to extract a specific string value from a text file (file1.txt)

then to create HTTP GET request with the extracted string (url address)

the HTTP response should be saved as a new HTML file in the
directory.

The string I'm trying to extract is a value of a specific key.
For example: "display_url":"test.com" (extract "test.com" and then to create http request)

The structure of file1.txt content could be multiple instances of display_url, since it is in a list under urls. if there is more then one value I want to make HTTP request for each one of them.

My txt file content:

{"created_at":"Thu Nov 15 11:35:00 +0000 2018","id":15292802,"id_str":325802","text":"test8 https://test/ZtCsuk7Ek2 #osining","source":"u003ca href="http://twitter.com" rel="nofollow"u003eTwitter Web Clientu003c/au003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":961508561217052675,"id_str":"961508561217052675","name":"Online S","screen_name":"osectraining","location":"Israel","url":"https://www.test.co.il","description":"test","translator_type":"none","protected":false,"verified":false,"followers_count":2,"friends_count":51,"listed_count":0,"favourites_count":0,"statuses_count":7,"created_at":"Thu Feb 08 07:54:39 +0000 2018","utc_offset":null,"time_zone":null,"geo_enabled":false,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"000000","profile_background_image_url":"http://abs.twimg.com/images/themes/theme1/bg.png","profile_background_image_url_https":"https://abs.twimg.com/images/themes/theme1/bg.png","profile_background_tile":false,"profile_link_color":"1B95E0","profile_sidebar_border_color":"000000","profile_sidebar_fill_color":"000000","profile_text_color":"000000","profile_use_background_image":false,"profile_image_url":"http://pbs.twimg.com/profile_images/961510231346958336/d_KhBeTD_normal.jpg","profile_image_url_https":"https://pbs.twimg.com/profile_images/961510231346958336/d_KhBeTD_normal.jpg","profile_banner_url":"https://pbs.twimg.com/profile_banners/961508561217052675/1518076913","default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"is_quote_status":false,"quote_count":0,"reply_count":0,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[{"text":"osectraining","indices":[33,46]}],"urls":[{"url":"https://test/ZtCsuk7Ek2","expanded_url":"http://test.com","display_url":"test.com","indices":[7,30]}],"user_mentions":,"symbols":},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"en","timestamp_ms":"1542281700508"}

edited Nov 21 '18 at 9:41

asked Nov 21 '18 at 9:02

bugnet17

667

The structure of your file content implies that there could be multiple instances of display_url, since it is in a list under urls. What should happen if multiple are found?

– lxop
Nov 21 '18 at 9:17

2

You are asking multiple questions at once. It would be wise to split up your question.

– Micha Wiedenmann
Nov 21 '18 at 9:19

This talks about parsing JSON with BASH: stackoverflow.com/questions/1955505/…

– Micha Wiedenmann
Nov 21 '18 at 9:19

Also, your txt file has a bonus " in it at the end of the id_str value

– lxop
Nov 21 '18 at 9:24

Edited my question.

– bugnet17
Nov 21 '18 at 9:41

|
show 2 more comments

I'm trying to extract a specific string value from a text file (file1.txt)

then to create HTTP GET request with the extracted string (url address)

the HTTP response should be saved as a new HTML file in the
directory.

The string I'm trying to extract is a value of a specific key.
For example: "display_url":"test.com" (extract "test.com" and then to create http request)

The structure of file1.txt content could be multiple instances of display_url, since it is in a list under urls. if there is more then one value I want to make HTTP request for each one of them.

My txt file content:

{"created_at":"Thu Nov 15 11:35:00 +0000 2018","id":15292802,"id_str":325802","text":"test8 https://test/ZtCsuk7Ek2 #osining","source":"u003ca href="http://twitter.com" rel="nofollow"u003eTwitter Web Clientu003c/au003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":961508561217052675,"id_str":"961508561217052675","name":"Online S","screen_name":"osectraining","location":"Israel","url":"https://www.test.co.il","description":"test","translator_type":"none","protected":false,"verified":false,"followers_count":2,"friends_count":51,"listed_count":0,"favourites_count":0,"statuses_count":7,"created_at":"Thu Feb 08 07:54:39 +0000 2018","utc_offset":null,"time_zone":null,"geo_enabled":false,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"000000","profile_background_image_url":"http://abs.twimg.com/images/themes/theme1/bg.png","profile_background_image_url_https":"https://abs.twimg.com/images/themes/theme1/bg.png","profile_background_tile":false,"profile_link_color":"1B95E0","profile_sidebar_border_color":"000000","profile_sidebar_fill_color":"000000","profile_text_color":"000000","profile_use_background_image":false,"profile_image_url":"http://pbs.twimg.com/profile_images/961510231346958336/d_KhBeTD_normal.jpg","profile_image_url_https":"https://pbs.twimg.com/profile_images/961510231346958336/d_KhBeTD_normal.jpg","profile_banner_url":"https://pbs.twimg.com/profile_banners/961508561217052675/1518076913","default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"is_quote_status":false,"quote_count":0,"reply_count":0,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[{"text":"osectraining","indices":[33,46]}],"urls":[{"url":"https://test/ZtCsuk7Ek2","expanded_url":"http://test.com","display_url":"test.com","indices":[7,30]}],"user_mentions":,"symbols":},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"en","timestamp_ms":"1542281700508"}

edited Nov 21 '18 at 9:41

asked Nov 21 '18 at 9:02

bugnet17

667

I'm trying to extract a specific string value from a text file (file1.txt)

then to create HTTP GET request with the extracted string (url address)

the HTTP response should be saved as a new HTML file in the
directory.

The string I'm trying to extract is a value of a specific key.
For example: "display_url":"test.com" (extract "test.com" and then to create http request)

The structure of file1.txt content could be multiple instances of display_url, since it is in a list under urls. if there is more then one value I want to make HTTP request for each one of them.

My txt file content:

{"created_at":"Thu Nov 15 11:35:00 +0000 2018","id":15292802,"id_str":325802","text":"test8 https://test/ZtCsuk7Ek2 #osining","source":"u003ca href="http://twitter.com" rel="nofollow"u003eTwitter Web Clientu003c/au003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":961508561217052675,"id_str":"961508561217052675","name":"Online S","screen_name":"osectraining","location":"Israel","url":"https://www.test.co.il","description":"test","translator_type":"none","protected":false,"verified":false,"followers_count":2,"friends_count":51,"listed_count":0,"favourites_count":0,"statuses_count":7,"created_at":"Thu Feb 08 07:54:39 +0000 2018","utc_offset":null,"time_zone":null,"geo_enabled":false,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"000000","profile_background_image_url":"http://abs.twimg.com/images/themes/theme1/bg.png","profile_background_image_url_https":"https://abs.twimg.com/images/themes/theme1/bg.png","profile_background_tile":false,"profile_link_color":"1B95E0","profile_sidebar_border_color":"000000","profile_sidebar_fill_color":"000000","profile_text_color":"000000","profile_use_background_image":false,"profile_image_url":"http://pbs.twimg.com/profile_images/961510231346958336/d_KhBeTD_normal.jpg","profile_image_url_https":"https://pbs.twimg.com/profile_images/961510231346958336/d_KhBeTD_normal.jpg","profile_banner_url":"https://pbs.twimg.com/profile_banners/961508561217052675/1518076913","default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"is_quote_status":false,"quote_count":0,"reply_count":0,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[{"text":"osectraining","indices":[33,46]}],"urls":[{"url":"https://test/ZtCsuk7Ek2","expanded_url":"http://test.com","display_url":"test.com","indices":[7,30]}],"user_mentions":,"symbols":},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"en","timestamp_ms":"1542281700508"}

bash

edited Nov 21 '18 at 9:41

asked Nov 21 '18 at 9:02

bugnet17

667

edited Nov 21 '18 at 9:41

asked Nov 21 '18 at 9:02

bugnet17

667

edited Nov 21 '18 at 9:41

asked Nov 21 '18 at 9:02

bugnet17

667

asked Nov 21 '18 at 9:02

bugnet17

667

asked Nov 21 '18 at 9:02

bugnet17

667

The structure of your file content implies that there could be multiple instances of display_url, since it is in a list under urls. What should happen if multiple are found?

– lxop
Nov 21 '18 at 9:17

2

You are asking multiple questions at once. It would be wise to split up your question.

– Micha Wiedenmann
Nov 21 '18 at 9:19

This talks about parsing JSON with BASH: stackoverflow.com/questions/1955505/…

– Micha Wiedenmann
Nov 21 '18 at 9:19

Also, your txt file has a bonus " in it at the end of the id_str value

– lxop
Nov 21 '18 at 9:24

Edited my question.

– bugnet17
Nov 21 '18 at 9:41

|
show 2 more comments

The structure of your file content implies that there could be multiple instances of display_url, since it is in a list under urls. What should happen if multiple are found?

– lxop
Nov 21 '18 at 9:17

2

You are asking multiple questions at once. It would be wise to split up your question.

– Micha Wiedenmann
Nov 21 '18 at 9:19

This talks about parsing JSON with BASH: stackoverflow.com/questions/1955505/…

– Micha Wiedenmann
Nov 21 '18 at 9:19

Also, your txt file has a bonus " in it at the end of the id_str value

– lxop
Nov 21 '18 at 9:24

Edited my question.

– bugnet17
Nov 21 '18 at 9:41

The structure of your file content implies that there could be multiple instances of display_url, since it is in a list under urls. What should happen if multiple are found?

– lxop
Nov 21 '18 at 9:17

You are asking multiple questions at once. It would be wise to split up your question.

– Micha Wiedenmann
Nov 21 '18 at 9:19

This talks about parsing JSON with BASH: stackoverflow.com/questions/1955505/…

– Micha Wiedenmann
Nov 21 '18 at 9:19

Also, your txt file has a bonus " in it at the end of the id_str value

– lxop
Nov 21 '18 at 9:24

Edited my question.

– bugnet17
Nov 21 '18 at 9:41

|
show 2 more comments

1 Answer
1

active

oldest

votes

1) Looks like your file is not valid JSON file so for step #1 you have to do something like this:

url=${ cat /tmp/x.txt | grep -oP '(?<=display_url":")[^"]+' }

2 & 3) Now you can do something like this:

curl $url -O /tmp/x.html

In case you have > 1 display_urls - you have to use loop, like this:

for url in $display_urls; do

    curl $url -O /tmp/$url.html

done

answered Nov 21 '18 at 13:42

Vladimir Kovpak

10.9k43646

and what if the file is in append mode? (I mean every few more time content is added to the file)

– bugnet17
Nov 21 '18 at 16:29

In this case use: curl $url >> /tmp/x.html.

– Vladimir Kovpak
Nov 21 '18 at 16:44

No. I mean to the file that I'm reading: x.txt

– bugnet17
Nov 21 '18 at 16:57

In this case you have to have loop which traverse array obtained from grep -oP '(?<=display_url":")[^"]+'.

– Vladimir Kovpak
Nov 21 '18 at 17:02

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53408463%2fextract-a-specific-string-from-text-file-and-create-http-request-with-the-extrac%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

1) Looks like your file is not valid JSON file so for step #1 you have to do something like this:

url=${ cat /tmp/x.txt | grep -oP '(?<=display_url":")[^"]+' }

2 & 3) Now you can do something like this:

curl $url -O /tmp/x.html

In case you have > 1 display_urls - you have to use loop, like this:

for url in $display_urls; do

    curl $url -O /tmp/$url.html

done

answered Nov 21 '18 at 13:42

Vladimir Kovpak

10.9k43646

and what if the file is in append mode? (I mean every few more time content is added to the file)

– bugnet17
Nov 21 '18 at 16:29

In this case use: curl $url >> /tmp/x.html.

– Vladimir Kovpak
Nov 21 '18 at 16:44

No. I mean to the file that I'm reading: x.txt

– bugnet17
Nov 21 '18 at 16:57

In this case you have to have loop which traverse array obtained from grep -oP '(?<=display_url":")[^"]+'.

– Vladimir Kovpak
Nov 21 '18 at 17:02

add a comment |

1) Looks like your file is not valid JSON file so for step #1 you have to do something like this:

url=${ cat /tmp/x.txt | grep -oP '(?<=display_url":")[^"]+' }

2 & 3) Now you can do something like this:

curl $url -O /tmp/x.html

In case you have > 1 display_urls - you have to use loop, like this:

for url in $display_urls; do

    curl $url -O /tmp/$url.html

done

answered Nov 21 '18 at 13:42

Vladimir Kovpak

10.9k43646

and what if the file is in append mode? (I mean every few more time content is added to the file)

– bugnet17
Nov 21 '18 at 16:29

In this case use: curl $url >> /tmp/x.html.

– Vladimir Kovpak
Nov 21 '18 at 16:44

No. I mean to the file that I'm reading: x.txt

– bugnet17
Nov 21 '18 at 16:57

In this case you have to have loop which traverse array obtained from grep -oP '(?<=display_url":")[^"]+'.

– Vladimir Kovpak
Nov 21 '18 at 17:02

add a comment |

1) Looks like your file is not valid JSON file so for step #1 you have to do something like this:

url=${ cat /tmp/x.txt | grep -oP '(?<=display_url":")[^"]+' }

2 & 3) Now you can do something like this:

curl $url -O /tmp/x.html

In case you have > 1 display_urls - you have to use loop, like this:

for url in $display_urls; do

    curl $url -O /tmp/$url.html

done

answered Nov 21 '18 at 13:42

Vladimir Kovpak

10.9k43646

1) Looks like your file is not valid JSON file so for step #1 you have to do something like this:

url=${ cat /tmp/x.txt | grep -oP '(?<=display_url":")[^"]+' }

2 & 3) Now you can do something like this:

curl $url -O /tmp/x.html

In case you have > 1 display_urls - you have to use loop, like this:

for url in $display_urls; do

    curl $url -O /tmp/$url.html

done

answered Nov 21 '18 at 13:42

Vladimir Kovpak

10.9k43646

answered Nov 21 '18 at 13:42

Vladimir Kovpak

10.9k43646

answered Nov 21 '18 at 13:42

Vladimir Kovpak

10.9k43646

answered Nov 21 '18 at 13:42

Vladimir Kovpak

10.9k43646

and what if the file is in append mode? (I mean every few more time content is added to the file)

– bugnet17
Nov 21 '18 at 16:29

In this case use: curl $url >> /tmp/x.html.

– Vladimir Kovpak
Nov 21 '18 at 16:44

No. I mean to the file that I'm reading: x.txt

– bugnet17
Nov 21 '18 at 16:57

In this case you have to have loop which traverse array obtained from grep -oP '(?<=display_url":")[^"]+'.

– Vladimir Kovpak
Nov 21 '18 at 17:02

add a comment |

and what if the file is in append mode? (I mean every few more time content is added to the file)

– bugnet17
Nov 21 '18 at 16:29

In this case use: curl $url >> /tmp/x.html.

– Vladimir Kovpak
Nov 21 '18 at 16:44

No. I mean to the file that I'm reading: x.txt

– bugnet17
Nov 21 '18 at 16:57

In this case you have to have loop which traverse array obtained from grep -oP '(?<=display_url":")[^"]+'.

– Vladimir Kovpak
Nov 21 '18 at 17:02

and what if the file is in append mode? (I mean every few more time content is added to the file)

– bugnet17
Nov 21 '18 at 16:29

In this case use: curl $url >> /tmp/x.html.

– Vladimir Kovpak
Nov 21 '18 at 16:44

No. I mean to the file that I'm reading: x.txt

– bugnet17
Nov 21 '18 at 16:57

In this case you have to have loop which traverse array obtained from grep -oP '(?<=display_url":")[^"]+'.

– Vladimir Kovpak
Nov 21 '18 at 17:02

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

Search This Blog

Ufyukyu