Extract a specific string from text file and create HTTP request with the extract string












0
















  1. I'm trying to extract a specific string value from a text file (file1.txt)

  2. then to create HTTP GET request with the extracted string (url address)

  3. the HTTP response should be saved as a new HTML file in the
    directory.


The string I'm trying to extract is a value of a specific key.
For example: "display_url":"test.com" (extract "test.com" and then to create http request)



The structure of file1.txt content could be multiple instances of display_url, since it is in a list under urls. if there is more then one value I want to make HTTP request for each one of them.



My txt file content:



{"created_at":"Thu Nov 15 11:35:00 +0000 2018","id":15292802,"id_str":325802","text":"test8 https://test/ZtCsuk7Ek2 #osining","source":"u003ca href="http://twitter.com" rel="nofollow"u003eTwitter Web Clientu003c/au003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":961508561217052675,"id_str":"961508561217052675","name":"Online S","screen_name":"osectraining","location":"Israel","url":"https://www.test.co.il","description":"test","translator_type":"none","protected":false,"verified":false,"followers_count":2,"friends_count":51,"listed_count":0,"favourites_count":0,"statuses_count":7,"created_at":"Thu Feb 08 07:54:39 +0000 2018","utc_offset":null,"time_zone":null,"geo_enabled":false,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"000000","profile_background_image_url":"http://abs.twimg.com/images/themes/theme1/bg.png","profile_background_image_url_https":"https://abs.twimg.com/images/themes/theme1/bg.png","profile_background_tile":false,"profile_link_color":"1B95E0","profile_sidebar_border_color":"000000","profile_sidebar_fill_color":"000000","profile_text_color":"000000","profile_use_background_image":false,"profile_image_url":"http://pbs.twimg.com/profile_images/961510231346958336/d_KhBeTD_normal.jpg","profile_image_url_https":"https://pbs.twimg.com/profile_images/961510231346958336/d_KhBeTD_normal.jpg","profile_banner_url":"https://pbs.twimg.com/profile_banners/961508561217052675/1518076913","default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"is_quote_status":false,"quote_count":0,"reply_count":0,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[{"text":"osectraining","indices":[33,46]}],"urls":[{"url":"https://test/ZtCsuk7Ek2","expanded_url":"http://test.com","display_url":"test.com","indices":[7,30]}],"user_mentions":,"symbols":},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"en","timestamp_ms":"1542281700508"}









share|improve this question

























  • The structure of your file content implies that there could be multiple instances of display_url, since it is in a list under urls. What should happen if multiple are found?

    – lxop
    Nov 21 '18 at 9:17






  • 2





    You are asking multiple questions at once. It would be wise to split up your question.

    – Micha Wiedenmann
    Nov 21 '18 at 9:19











  • This talks about parsing JSON with BASH: stackoverflow.com/questions/1955505/…

    – Micha Wiedenmann
    Nov 21 '18 at 9:19











  • Also, your txt file has a bonus " in it at the end of the id_str value

    – lxop
    Nov 21 '18 at 9:24











  • Edited my question.

    – bugnet17
    Nov 21 '18 at 9:41
















0
















  1. I'm trying to extract a specific string value from a text file (file1.txt)

  2. then to create HTTP GET request with the extracted string (url address)

  3. the HTTP response should be saved as a new HTML file in the
    directory.


The string I'm trying to extract is a value of a specific key.
For example: "display_url":"test.com" (extract "test.com" and then to create http request)



The structure of file1.txt content could be multiple instances of display_url, since it is in a list under urls. if there is more then one value I want to make HTTP request for each one of them.



My txt file content:



{"created_at":"Thu Nov 15 11:35:00 +0000 2018","id":15292802,"id_str":325802","text":"test8 https://test/ZtCsuk7Ek2 #osining","source":"u003ca href="http://twitter.com" rel="nofollow"u003eTwitter Web Clientu003c/au003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":961508561217052675,"id_str":"961508561217052675","name":"Online S","screen_name":"osectraining","location":"Israel","url":"https://www.test.co.il","description":"test","translator_type":"none","protected":false,"verified":false,"followers_count":2,"friends_count":51,"listed_count":0,"favourites_count":0,"statuses_count":7,"created_at":"Thu Feb 08 07:54:39 +0000 2018","utc_offset":null,"time_zone":null,"geo_enabled":false,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"000000","profile_background_image_url":"http://abs.twimg.com/images/themes/theme1/bg.png","profile_background_image_url_https":"https://abs.twimg.com/images/themes/theme1/bg.png","profile_background_tile":false,"profile_link_color":"1B95E0","profile_sidebar_border_color":"000000","profile_sidebar_fill_color":"000000","profile_text_color":"000000","profile_use_background_image":false,"profile_image_url":"http://pbs.twimg.com/profile_images/961510231346958336/d_KhBeTD_normal.jpg","profile_image_url_https":"https://pbs.twimg.com/profile_images/961510231346958336/d_KhBeTD_normal.jpg","profile_banner_url":"https://pbs.twimg.com/profile_banners/961508561217052675/1518076913","default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"is_quote_status":false,"quote_count":0,"reply_count":0,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[{"text":"osectraining","indices":[33,46]}],"urls":[{"url":"https://test/ZtCsuk7Ek2","expanded_url":"http://test.com","display_url":"test.com","indices":[7,30]}],"user_mentions":,"symbols":},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"en","timestamp_ms":"1542281700508"}









share|improve this question

























  • The structure of your file content implies that there could be multiple instances of display_url, since it is in a list under urls. What should happen if multiple are found?

    – lxop
    Nov 21 '18 at 9:17






  • 2





    You are asking multiple questions at once. It would be wise to split up your question.

    – Micha Wiedenmann
    Nov 21 '18 at 9:19











  • This talks about parsing JSON with BASH: stackoverflow.com/questions/1955505/…

    – Micha Wiedenmann
    Nov 21 '18 at 9:19











  • Also, your txt file has a bonus " in it at the end of the id_str value

    – lxop
    Nov 21 '18 at 9:24











  • Edited my question.

    – bugnet17
    Nov 21 '18 at 9:41














0












0








0









  1. I'm trying to extract a specific string value from a text file (file1.txt)

  2. then to create HTTP GET request with the extracted string (url address)

  3. the HTTP response should be saved as a new HTML file in the
    directory.


The string I'm trying to extract is a value of a specific key.
For example: "display_url":"test.com" (extract "test.com" and then to create http request)



The structure of file1.txt content could be multiple instances of display_url, since it is in a list under urls. if there is more then one value I want to make HTTP request for each one of them.



My txt file content:



{"created_at":"Thu Nov 15 11:35:00 +0000 2018","id":15292802,"id_str":325802","text":"test8 https://test/ZtCsuk7Ek2 #osining","source":"u003ca href="http://twitter.com" rel="nofollow"u003eTwitter Web Clientu003c/au003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":961508561217052675,"id_str":"961508561217052675","name":"Online S","screen_name":"osectraining","location":"Israel","url":"https://www.test.co.il","description":"test","translator_type":"none","protected":false,"verified":false,"followers_count":2,"friends_count":51,"listed_count":0,"favourites_count":0,"statuses_count":7,"created_at":"Thu Feb 08 07:54:39 +0000 2018","utc_offset":null,"time_zone":null,"geo_enabled":false,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"000000","profile_background_image_url":"http://abs.twimg.com/images/themes/theme1/bg.png","profile_background_image_url_https":"https://abs.twimg.com/images/themes/theme1/bg.png","profile_background_tile":false,"profile_link_color":"1B95E0","profile_sidebar_border_color":"000000","profile_sidebar_fill_color":"000000","profile_text_color":"000000","profile_use_background_image":false,"profile_image_url":"http://pbs.twimg.com/profile_images/961510231346958336/d_KhBeTD_normal.jpg","profile_image_url_https":"https://pbs.twimg.com/profile_images/961510231346958336/d_KhBeTD_normal.jpg","profile_banner_url":"https://pbs.twimg.com/profile_banners/961508561217052675/1518076913","default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"is_quote_status":false,"quote_count":0,"reply_count":0,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[{"text":"osectraining","indices":[33,46]}],"urls":[{"url":"https://test/ZtCsuk7Ek2","expanded_url":"http://test.com","display_url":"test.com","indices":[7,30]}],"user_mentions":,"symbols":},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"en","timestamp_ms":"1542281700508"}









share|improve this question

















  1. I'm trying to extract a specific string value from a text file (file1.txt)

  2. then to create HTTP GET request with the extracted string (url address)

  3. the HTTP response should be saved as a new HTML file in the
    directory.


The string I'm trying to extract is a value of a specific key.
For example: "display_url":"test.com" (extract "test.com" and then to create http request)



The structure of file1.txt content could be multiple instances of display_url, since it is in a list under urls. if there is more then one value I want to make HTTP request for each one of them.



My txt file content:



{"created_at":"Thu Nov 15 11:35:00 +0000 2018","id":15292802,"id_str":325802","text":"test8 https://test/ZtCsuk7Ek2 #osining","source":"u003ca href="http://twitter.com" rel="nofollow"u003eTwitter Web Clientu003c/au003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":961508561217052675,"id_str":"961508561217052675","name":"Online S","screen_name":"osectraining","location":"Israel","url":"https://www.test.co.il","description":"test","translator_type":"none","protected":false,"verified":false,"followers_count":2,"friends_count":51,"listed_count":0,"favourites_count":0,"statuses_count":7,"created_at":"Thu Feb 08 07:54:39 +0000 2018","utc_offset":null,"time_zone":null,"geo_enabled":false,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"000000","profile_background_image_url":"http://abs.twimg.com/images/themes/theme1/bg.png","profile_background_image_url_https":"https://abs.twimg.com/images/themes/theme1/bg.png","profile_background_tile":false,"profile_link_color":"1B95E0","profile_sidebar_border_color":"000000","profile_sidebar_fill_color":"000000","profile_text_color":"000000","profile_use_background_image":false,"profile_image_url":"http://pbs.twimg.com/profile_images/961510231346958336/d_KhBeTD_normal.jpg","profile_image_url_https":"https://pbs.twimg.com/profile_images/961510231346958336/d_KhBeTD_normal.jpg","profile_banner_url":"https://pbs.twimg.com/profile_banners/961508561217052675/1518076913","default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"is_quote_status":false,"quote_count":0,"reply_count":0,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[{"text":"osectraining","indices":[33,46]}],"urls":[{"url":"https://test/ZtCsuk7Ek2","expanded_url":"http://test.com","display_url":"test.com","indices":[7,30]}],"user_mentions":,"symbols":},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"en","timestamp_ms":"1542281700508"}






bash






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 21 '18 at 9:41







bugnet17

















asked Nov 21 '18 at 9:02









bugnet17bugnet17

667




667













  • The structure of your file content implies that there could be multiple instances of display_url, since it is in a list under urls. What should happen if multiple are found?

    – lxop
    Nov 21 '18 at 9:17






  • 2





    You are asking multiple questions at once. It would be wise to split up your question.

    – Micha Wiedenmann
    Nov 21 '18 at 9:19











  • This talks about parsing JSON with BASH: stackoverflow.com/questions/1955505/…

    – Micha Wiedenmann
    Nov 21 '18 at 9:19











  • Also, your txt file has a bonus " in it at the end of the id_str value

    – lxop
    Nov 21 '18 at 9:24











  • Edited my question.

    – bugnet17
    Nov 21 '18 at 9:41



















  • The structure of your file content implies that there could be multiple instances of display_url, since it is in a list under urls. What should happen if multiple are found?

    – lxop
    Nov 21 '18 at 9:17






  • 2





    You are asking multiple questions at once. It would be wise to split up your question.

    – Micha Wiedenmann
    Nov 21 '18 at 9:19











  • This talks about parsing JSON with BASH: stackoverflow.com/questions/1955505/…

    – Micha Wiedenmann
    Nov 21 '18 at 9:19











  • Also, your txt file has a bonus " in it at the end of the id_str value

    – lxop
    Nov 21 '18 at 9:24











  • Edited my question.

    – bugnet17
    Nov 21 '18 at 9:41

















The structure of your file content implies that there could be multiple instances of display_url, since it is in a list under urls. What should happen if multiple are found?

– lxop
Nov 21 '18 at 9:17





The structure of your file content implies that there could be multiple instances of display_url, since it is in a list under urls. What should happen if multiple are found?

– lxop
Nov 21 '18 at 9:17




2




2





You are asking multiple questions at once. It would be wise to split up your question.

– Micha Wiedenmann
Nov 21 '18 at 9:19





You are asking multiple questions at once. It would be wise to split up your question.

– Micha Wiedenmann
Nov 21 '18 at 9:19













This talks about parsing JSON with BASH: stackoverflow.com/questions/1955505/…

– Micha Wiedenmann
Nov 21 '18 at 9:19





This talks about parsing JSON with BASH: stackoverflow.com/questions/1955505/…

– Micha Wiedenmann
Nov 21 '18 at 9:19













Also, your txt file has a bonus " in it at the end of the id_str value

– lxop
Nov 21 '18 at 9:24





Also, your txt file has a bonus " in it at the end of the id_str value

– lxop
Nov 21 '18 at 9:24













Edited my question.

– bugnet17
Nov 21 '18 at 9:41





Edited my question.

– bugnet17
Nov 21 '18 at 9:41












1 Answer
1






active

oldest

votes


















0














1) Looks like your file is not valid JSON file so for step #1 you have to do something like this:



url=${ cat /tmp/x.txt | grep -oP '(?<=display_url":")[^"]+' }


2 & 3) Now you can do something like this:



curl $url -O /tmp/x.html


In case you have > 1 display_urls - you have to use loop, like this:



for url in $display_urls; do
curl $url -O /tmp/$url.html
done





share|improve this answer
























  • and what if the file is in append mode? (I mean every few more time content is added to the file)

    – bugnet17
    Nov 21 '18 at 16:29











  • In this case use: curl $url >> /tmp/x.html.

    – Vladimir Kovpak
    Nov 21 '18 at 16:44













  • No. I mean to the file that I'm reading: x.txt

    – bugnet17
    Nov 21 '18 at 16:57











  • In this case you have to have loop which traverse array obtained from grep -oP '(?<=display_url":")[^"]+'.

    – Vladimir Kovpak
    Nov 21 '18 at 17:02











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53408463%2fextract-a-specific-string-from-text-file-and-create-http-request-with-the-extrac%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









0














1) Looks like your file is not valid JSON file so for step #1 you have to do something like this:



url=${ cat /tmp/x.txt | grep -oP '(?<=display_url":")[^"]+' }


2 & 3) Now you can do something like this:



curl $url -O /tmp/x.html


In case you have > 1 display_urls - you have to use loop, like this:



for url in $display_urls; do
curl $url -O /tmp/$url.html
done





share|improve this answer
























  • and what if the file is in append mode? (I mean every few more time content is added to the file)

    – bugnet17
    Nov 21 '18 at 16:29











  • In this case use: curl $url >> /tmp/x.html.

    – Vladimir Kovpak
    Nov 21 '18 at 16:44













  • No. I mean to the file that I'm reading: x.txt

    – bugnet17
    Nov 21 '18 at 16:57











  • In this case you have to have loop which traverse array obtained from grep -oP '(?<=display_url":")[^"]+'.

    – Vladimir Kovpak
    Nov 21 '18 at 17:02
















0














1) Looks like your file is not valid JSON file so for step #1 you have to do something like this:



url=${ cat /tmp/x.txt | grep -oP '(?<=display_url":")[^"]+' }


2 & 3) Now you can do something like this:



curl $url -O /tmp/x.html


In case you have > 1 display_urls - you have to use loop, like this:



for url in $display_urls; do
curl $url -O /tmp/$url.html
done





share|improve this answer
























  • and what if the file is in append mode? (I mean every few more time content is added to the file)

    – bugnet17
    Nov 21 '18 at 16:29











  • In this case use: curl $url >> /tmp/x.html.

    – Vladimir Kovpak
    Nov 21 '18 at 16:44













  • No. I mean to the file that I'm reading: x.txt

    – bugnet17
    Nov 21 '18 at 16:57











  • In this case you have to have loop which traverse array obtained from grep -oP '(?<=display_url":")[^"]+'.

    – Vladimir Kovpak
    Nov 21 '18 at 17:02














0












0








0







1) Looks like your file is not valid JSON file so for step #1 you have to do something like this:



url=${ cat /tmp/x.txt | grep -oP '(?<=display_url":")[^"]+' }


2 & 3) Now you can do something like this:



curl $url -O /tmp/x.html


In case you have > 1 display_urls - you have to use loop, like this:



for url in $display_urls; do
curl $url -O /tmp/$url.html
done





share|improve this answer













1) Looks like your file is not valid JSON file so for step #1 you have to do something like this:



url=${ cat /tmp/x.txt | grep -oP '(?<=display_url":")[^"]+' }


2 & 3) Now you can do something like this:



curl $url -O /tmp/x.html


In case you have > 1 display_urls - you have to use loop, like this:



for url in $display_urls; do
curl $url -O /tmp/$url.html
done






share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 21 '18 at 13:42









Vladimir KovpakVladimir Kovpak

10.9k43646




10.9k43646













  • and what if the file is in append mode? (I mean every few more time content is added to the file)

    – bugnet17
    Nov 21 '18 at 16:29











  • In this case use: curl $url >> /tmp/x.html.

    – Vladimir Kovpak
    Nov 21 '18 at 16:44













  • No. I mean to the file that I'm reading: x.txt

    – bugnet17
    Nov 21 '18 at 16:57











  • In this case you have to have loop which traverse array obtained from grep -oP '(?<=display_url":")[^"]+'.

    – Vladimir Kovpak
    Nov 21 '18 at 17:02



















  • and what if the file is in append mode? (I mean every few more time content is added to the file)

    – bugnet17
    Nov 21 '18 at 16:29











  • In this case use: curl $url >> /tmp/x.html.

    – Vladimir Kovpak
    Nov 21 '18 at 16:44













  • No. I mean to the file that I'm reading: x.txt

    – bugnet17
    Nov 21 '18 at 16:57











  • In this case you have to have loop which traverse array obtained from grep -oP '(?<=display_url":")[^"]+'.

    – Vladimir Kovpak
    Nov 21 '18 at 17:02

















and what if the file is in append mode? (I mean every few more time content is added to the file)

– bugnet17
Nov 21 '18 at 16:29





and what if the file is in append mode? (I mean every few more time content is added to the file)

– bugnet17
Nov 21 '18 at 16:29













In this case use: curl $url >> /tmp/x.html.

– Vladimir Kovpak
Nov 21 '18 at 16:44







In this case use: curl $url >> /tmp/x.html.

– Vladimir Kovpak
Nov 21 '18 at 16:44















No. I mean to the file that I'm reading: x.txt

– bugnet17
Nov 21 '18 at 16:57





No. I mean to the file that I'm reading: x.txt

– bugnet17
Nov 21 '18 at 16:57













In this case you have to have loop which traverse array obtained from grep -oP '(?<=display_url":")[^"]+'.

– Vladimir Kovpak
Nov 21 '18 at 17:02





In this case you have to have loop which traverse array obtained from grep -oP '(?<=display_url":")[^"]+'.

– Vladimir Kovpak
Nov 21 '18 at 17:02


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53408463%2fextract-a-specific-string-from-text-file-and-create-http-request-with-the-extrac%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

MongoDB - Not Authorized To Execute Command

How to fix TextFormField cause rebuild widget in Flutter

in spring boot 2.1 many test slices are not allowed anymore due to multiple @BootstrapWith