How to organize data in a json file created through webscraping












2















I'm trying to get article titles from yahoo news and organize it in a json file. When I dump the data to a json file it appears confusing to read. How would I go about organizing the data, either after the dump or from the beginning?



This for a web scraping project where I have to get top news articles and their bodies and export them to a json file which can then be sent to someone else's program. For now, I'm just working on getting the titles from the yahoo finance homepage.



import requests
import json
from bs4 import BeautifulSoup

#Getting webpage
page = requests.get("https://finance.yahoo.com/")
soup = BeautifulSoup(page.content, 'html.parser') #creating instance of class to parse the page
#Getting article title
title = soup.find_all(class_="Mb(5px)")
desc = soup.find_all(class_="Fz(14px) Lh(19px) Fz(13px)--sm1024 Lh(17px)-- sm1024 LineClamp(3,57px) LineClamp(3,51px)--sm1024 M(0)")
#Getting article bodies
page2 = requests.get("https://finance.yahoo.com/news/warren-buffett-suggests-read-19th-204800450.html")
soup2 = BeautifulSoup(page2.content, 'html.parser')
body = soup.find_all(class_="canvas-atom canvas-text Mb(1.0em) Mb(0)--sm
Mt(0.8em)--sm", id="15")


#Organizing data for export
data = {'title1': title[0].get_text(),
'title2': title[1].get_text(),
'title3': title[2].get_text(),
'title4': title[3].get_text(),
'title5': title[4].get_text()}

#Exporting the data to results.json
with open("results.json", "w") as write_file:
json.dump(str(data), write_file)


This is what ends up being written on the json file (at the time of writing this post):



"{'title1': 'These US taxpayers face higher payments thanks to new law', 
'title2': 'These 12 Stocks Are the Best Values in 2019, According to Pros
Whou2019ve Outsmarted the Market', '\ntitle3': 'The Best Move You Can
Make With Your Investments in 2019, According to 5 Market Professionals',
'title4': 'The auto industry said goodbye to a lot of cars in 2018',
'title5': '7 Stock Picks From Top-Rated Wall Street Analysts'}"


I would like to code to show each article title on a separate line and remove the random ''s that appear in the middle.










share|improve this question























  • JSON is relatively human-readable, but isn't a 'pretty' output format. If you want pretty output, then you need to read in the file and parse it for output, though as you say this is for import to another program, I'm not sure why you're worried about this?

    – match
    Jan 1 at 18:11











  • try json.dump(data, write_file, indent=4)

    – t.m.adam
    Jan 1 at 18:40











  • @match i mainly wanted to remove the unnecessry ''s to make it easier for the next group to analyze

    – Ganlas
    Jan 2 at 1:05
















2















I'm trying to get article titles from yahoo news and organize it in a json file. When I dump the data to a json file it appears confusing to read. How would I go about organizing the data, either after the dump or from the beginning?



This for a web scraping project where I have to get top news articles and their bodies and export them to a json file which can then be sent to someone else's program. For now, I'm just working on getting the titles from the yahoo finance homepage.



import requests
import json
from bs4 import BeautifulSoup

#Getting webpage
page = requests.get("https://finance.yahoo.com/")
soup = BeautifulSoup(page.content, 'html.parser') #creating instance of class to parse the page
#Getting article title
title = soup.find_all(class_="Mb(5px)")
desc = soup.find_all(class_="Fz(14px) Lh(19px) Fz(13px)--sm1024 Lh(17px)-- sm1024 LineClamp(3,57px) LineClamp(3,51px)--sm1024 M(0)")
#Getting article bodies
page2 = requests.get("https://finance.yahoo.com/news/warren-buffett-suggests-read-19th-204800450.html")
soup2 = BeautifulSoup(page2.content, 'html.parser')
body = soup.find_all(class_="canvas-atom canvas-text Mb(1.0em) Mb(0)--sm
Mt(0.8em)--sm", id="15")


#Organizing data for export
data = {'title1': title[0].get_text(),
'title2': title[1].get_text(),
'title3': title[2].get_text(),
'title4': title[3].get_text(),
'title5': title[4].get_text()}

#Exporting the data to results.json
with open("results.json", "w") as write_file:
json.dump(str(data), write_file)


This is what ends up being written on the json file (at the time of writing this post):



"{'title1': 'These US taxpayers face higher payments thanks to new law', 
'title2': 'These 12 Stocks Are the Best Values in 2019, According to Pros
Whou2019ve Outsmarted the Market', '\ntitle3': 'The Best Move You Can
Make With Your Investments in 2019, According to 5 Market Professionals',
'title4': 'The auto industry said goodbye to a lot of cars in 2018',
'title5': '7 Stock Picks From Top-Rated Wall Street Analysts'}"


I would like to code to show each article title on a separate line and remove the random ''s that appear in the middle.










share|improve this question























  • JSON is relatively human-readable, but isn't a 'pretty' output format. If you want pretty output, then you need to read in the file and parse it for output, though as you say this is for import to another program, I'm not sure why you're worried about this?

    – match
    Jan 1 at 18:11











  • try json.dump(data, write_file, indent=4)

    – t.m.adam
    Jan 1 at 18:40











  • @match i mainly wanted to remove the unnecessry ''s to make it easier for the next group to analyze

    – Ganlas
    Jan 2 at 1:05














2












2








2


0






I'm trying to get article titles from yahoo news and organize it in a json file. When I dump the data to a json file it appears confusing to read. How would I go about organizing the data, either after the dump or from the beginning?



This for a web scraping project where I have to get top news articles and their bodies and export them to a json file which can then be sent to someone else's program. For now, I'm just working on getting the titles from the yahoo finance homepage.



import requests
import json
from bs4 import BeautifulSoup

#Getting webpage
page = requests.get("https://finance.yahoo.com/")
soup = BeautifulSoup(page.content, 'html.parser') #creating instance of class to parse the page
#Getting article title
title = soup.find_all(class_="Mb(5px)")
desc = soup.find_all(class_="Fz(14px) Lh(19px) Fz(13px)--sm1024 Lh(17px)-- sm1024 LineClamp(3,57px) LineClamp(3,51px)--sm1024 M(0)")
#Getting article bodies
page2 = requests.get("https://finance.yahoo.com/news/warren-buffett-suggests-read-19th-204800450.html")
soup2 = BeautifulSoup(page2.content, 'html.parser')
body = soup.find_all(class_="canvas-atom canvas-text Mb(1.0em) Mb(0)--sm
Mt(0.8em)--sm", id="15")


#Organizing data for export
data = {'title1': title[0].get_text(),
'title2': title[1].get_text(),
'title3': title[2].get_text(),
'title4': title[3].get_text(),
'title5': title[4].get_text()}

#Exporting the data to results.json
with open("results.json", "w") as write_file:
json.dump(str(data), write_file)


This is what ends up being written on the json file (at the time of writing this post):



"{'title1': 'These US taxpayers face higher payments thanks to new law', 
'title2': 'These 12 Stocks Are the Best Values in 2019, According to Pros
Whou2019ve Outsmarted the Market', '\ntitle3': 'The Best Move You Can
Make With Your Investments in 2019, According to 5 Market Professionals',
'title4': 'The auto industry said goodbye to a lot of cars in 2018',
'title5': '7 Stock Picks From Top-Rated Wall Street Analysts'}"


I would like to code to show each article title on a separate line and remove the random ''s that appear in the middle.










share|improve this question














I'm trying to get article titles from yahoo news and organize it in a json file. When I dump the data to a json file it appears confusing to read. How would I go about organizing the data, either after the dump or from the beginning?



This for a web scraping project where I have to get top news articles and their bodies and export them to a json file which can then be sent to someone else's program. For now, I'm just working on getting the titles from the yahoo finance homepage.



import requests
import json
from bs4 import BeautifulSoup

#Getting webpage
page = requests.get("https://finance.yahoo.com/")
soup = BeautifulSoup(page.content, 'html.parser') #creating instance of class to parse the page
#Getting article title
title = soup.find_all(class_="Mb(5px)")
desc = soup.find_all(class_="Fz(14px) Lh(19px) Fz(13px)--sm1024 Lh(17px)-- sm1024 LineClamp(3,57px) LineClamp(3,51px)--sm1024 M(0)")
#Getting article bodies
page2 = requests.get("https://finance.yahoo.com/news/warren-buffett-suggests-read-19th-204800450.html")
soup2 = BeautifulSoup(page2.content, 'html.parser')
body = soup.find_all(class_="canvas-atom canvas-text Mb(1.0em) Mb(0)--sm
Mt(0.8em)--sm", id="15")


#Organizing data for export
data = {'title1': title[0].get_text(),
'title2': title[1].get_text(),
'title3': title[2].get_text(),
'title4': title[3].get_text(),
'title5': title[4].get_text()}

#Exporting the data to results.json
with open("results.json", "w") as write_file:
json.dump(str(data), write_file)


This is what ends up being written on the json file (at the time of writing this post):



"{'title1': 'These US taxpayers face higher payments thanks to new law', 
'title2': 'These 12 Stocks Are the Best Values in 2019, According to Pros
Whou2019ve Outsmarted the Market', '\ntitle3': 'The Best Move You Can
Make With Your Investments in 2019, According to 5 Market Professionals',
'title4': 'The auto industry said goodbye to a lot of cars in 2018',
'title5': '7 Stock Picks From Top-Rated Wall Street Analysts'}"


I would like to code to show each article title on a separate line and remove the random ''s that appear in the middle.







python json beautifulsoup repl.it






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Jan 1 at 18:01









GanlasGanlas

133




133













  • JSON is relatively human-readable, but isn't a 'pretty' output format. If you want pretty output, then you need to read in the file and parse it for output, though as you say this is for import to another program, I'm not sure why you're worried about this?

    – match
    Jan 1 at 18:11











  • try json.dump(data, write_file, indent=4)

    – t.m.adam
    Jan 1 at 18:40











  • @match i mainly wanted to remove the unnecessry ''s to make it easier for the next group to analyze

    – Ganlas
    Jan 2 at 1:05



















  • JSON is relatively human-readable, but isn't a 'pretty' output format. If you want pretty output, then you need to read in the file and parse it for output, though as you say this is for import to another program, I'm not sure why you're worried about this?

    – match
    Jan 1 at 18:11











  • try json.dump(data, write_file, indent=4)

    – t.m.adam
    Jan 1 at 18:40











  • @match i mainly wanted to remove the unnecessry ''s to make it easier for the next group to analyze

    – Ganlas
    Jan 2 at 1:05

















JSON is relatively human-readable, but isn't a 'pretty' output format. If you want pretty output, then you need to read in the file and parse it for output, though as you say this is for import to another program, I'm not sure why you're worried about this?

– match
Jan 1 at 18:11





JSON is relatively human-readable, but isn't a 'pretty' output format. If you want pretty output, then you need to read in the file and parse it for output, though as you say this is for import to another program, I'm not sure why you're worried about this?

– match
Jan 1 at 18:11













try json.dump(data, write_file, indent=4)

– t.m.adam
Jan 1 at 18:40





try json.dump(data, write_file, indent=4)

– t.m.adam
Jan 1 at 18:40













@match i mainly wanted to remove the unnecessry ''s to make it easier for the next group to analyze

– Ganlas
Jan 2 at 1:05





@match i mainly wanted to remove the unnecessry ''s to make it easier for the next group to analyze

– Ganlas
Jan 2 at 1:05












2 Answers
2






active

oldest

votes


















0














I have run your code but I didn't get any result like that you got. You have defined 'title3' which is a constant, but you got 'n' which I didn't get actually in my case. By the way, you were getting /'s because you didn't encoded it correctly like 'utf8' and ascii ensure set to false. I would suggest two change like - 'lxml' parser not 'html.parser' and this code snippet:



with open("results.json", "w",encoding='utf8') as write_file:
json.dump(str(data), write_file ,ensure_ascii=False)


this totally worked for me /'s exclusion and ascii issues solved as well.






share|improve this answer































    0














    import requests
    import json
    from bs4 import BeautifulSoup
    #Getting webpage
    page = requests.get("https://finance.yahoo.com/")
    soup = BeautifulSoup(page.content, 'html.parser') #creating instance of class to parse the page
    #Getting article title
    title = soup.find_all(class_="Mb(5px)")
    desc = soup.find_all(class_="Fz(14px) Lh(19px) Fz(13px)--sm1024 Lh(17px)-- sm1024 LineClamp(3,57px) LineClamp(3,51px)--sm1024 M(0)")
    #Getting article bodies
    page2 = requests.get("https://finance.yahoo.com/news/warren-buffett-suggests-read-19th-204800450.html")
    soup2 = BeautifulSoup(page2.content, 'html.parser')
    body = soup.find_all(class_="canvas-atom canvas-text Mb(1.0em) Mb(0)--sm Mt(0.8em)--sm", id="15")
    title=[x.get_text().strip() for x in title]
    limit=len(title) #change this to 5 if you need only the first 5
    data={"title"+str(i+1):title[i] for i in range(0,limit)}
    with open("results.json", "w",encoding='utf-8') as write_file:
    write_file.write(json.dumps(data, ensure_ascii=False,indent=4))


    results.json:



    {
    "title1": "These 12 Stocks Are the Best Values in 2019, According to Pros Who’ve Outsmarted the Market",
    "title2": "These US taxpayers face higher payments thanks to new law",
    "title3": "The Best Move You Can Make With Your Investments in 2019, According to 5 Market Professionals",
    "title4": "Cramer Remix: Here's where your first $10,000 should be i...",
    "title5": "The auto industry said goodbye to a lot of cars in 2018",
    "title6": "Ocado Pips Adyen to Take Crown of 2018's Best European Stock",
    "title7": "7 Stock Picks From Top-Rated Wall Street Analysts",
    "title8": "Buy IBM Stock as It Begins 2019 as the Cheapest Dow Component",
    "title9": "$70 Oil Could Be Right Around The Corner",
    "title10": "What Is the Highest Credit Score and How Do You Get It?",
    "title11": "Silver Price Forecast – Silver markets stall on New Year’s Eve",
    "title12": "This Chart Says the S&P 500 Could Rebound in 2019",
    "title13": "Should You Buy Some Berkshire Hathaway Stock?",
    "title14": "How Much Does a Financial Advisor Cost?",
    "title15": "Here Are the World's Biggest Billionaire Winners and Losers of 2018",
    "title16": "Tax tips: What you need to know before you file your taxes in 2019",
    "title17": "Kevin O’Leary: Make This Your Top New Year’s Resolution",
    "title18": "Dakota Access pipeline developer slow to replace some trees",
    "title19": "Einhorn's Greenlight Extends Decline to 34% in Worst Year",
    "title20": "4 companies to watch in 2019",
    "title21": "What Is My Debt-to-Income Ratio?",
    "title22": "US recession unlikely, market volatility to continue in 2019, El-Erian says",
    "title23": "Fidelity: Ignore stock market turbulence and stick to long-term goals",
    "title24": "Tax season: How you can come out a winner",
    "title25": "IBD 50 Growth Stocks To Watch"
    }





    share|improve this answer

























      Your Answer






      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "1"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53997705%2fhow-to-organize-data-in-a-json-file-created-through-webscraping%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      2 Answers
      2






      active

      oldest

      votes








      2 Answers
      2






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      0














      I have run your code but I didn't get any result like that you got. You have defined 'title3' which is a constant, but you got 'n' which I didn't get actually in my case. By the way, you were getting /'s because you didn't encoded it correctly like 'utf8' and ascii ensure set to false. I would suggest two change like - 'lxml' parser not 'html.parser' and this code snippet:



      with open("results.json", "w",encoding='utf8') as write_file:
      json.dump(str(data), write_file ,ensure_ascii=False)


      this totally worked for me /'s exclusion and ascii issues solved as well.






      share|improve this answer




























        0














        I have run your code but I didn't get any result like that you got. You have defined 'title3' which is a constant, but you got 'n' which I didn't get actually in my case. By the way, you were getting /'s because you didn't encoded it correctly like 'utf8' and ascii ensure set to false. I would suggest two change like - 'lxml' parser not 'html.parser' and this code snippet:



        with open("results.json", "w",encoding='utf8') as write_file:
        json.dump(str(data), write_file ,ensure_ascii=False)


        this totally worked for me /'s exclusion and ascii issues solved as well.






        share|improve this answer


























          0












          0








          0







          I have run your code but I didn't get any result like that you got. You have defined 'title3' which is a constant, but you got 'n' which I didn't get actually in my case. By the way, you were getting /'s because you didn't encoded it correctly like 'utf8' and ascii ensure set to false. I would suggest two change like - 'lxml' parser not 'html.parser' and this code snippet:



          with open("results.json", "w",encoding='utf8') as write_file:
          json.dump(str(data), write_file ,ensure_ascii=False)


          this totally worked for me /'s exclusion and ascii issues solved as well.






          share|improve this answer













          I have run your code but I didn't get any result like that you got. You have defined 'title3' which is a constant, but you got 'n' which I didn't get actually in my case. By the way, you were getting /'s because you didn't encoded it correctly like 'utf8' and ascii ensure set to false. I would suggest two change like - 'lxml' parser not 'html.parser' and this code snippet:



          with open("results.json", "w",encoding='utf8') as write_file:
          json.dump(str(data), write_file ,ensure_ascii=False)


          this totally worked for me /'s exclusion and ascii issues solved as well.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Jan 1 at 18:56









          Mobasshir BhuiyanMobasshir Bhuiyan

          338




          338

























              0














              import requests
              import json
              from bs4 import BeautifulSoup
              #Getting webpage
              page = requests.get("https://finance.yahoo.com/")
              soup = BeautifulSoup(page.content, 'html.parser') #creating instance of class to parse the page
              #Getting article title
              title = soup.find_all(class_="Mb(5px)")
              desc = soup.find_all(class_="Fz(14px) Lh(19px) Fz(13px)--sm1024 Lh(17px)-- sm1024 LineClamp(3,57px) LineClamp(3,51px)--sm1024 M(0)")
              #Getting article bodies
              page2 = requests.get("https://finance.yahoo.com/news/warren-buffett-suggests-read-19th-204800450.html")
              soup2 = BeautifulSoup(page2.content, 'html.parser')
              body = soup.find_all(class_="canvas-atom canvas-text Mb(1.0em) Mb(0)--sm Mt(0.8em)--sm", id="15")
              title=[x.get_text().strip() for x in title]
              limit=len(title) #change this to 5 if you need only the first 5
              data={"title"+str(i+1):title[i] for i in range(0,limit)}
              with open("results.json", "w",encoding='utf-8') as write_file:
              write_file.write(json.dumps(data, ensure_ascii=False,indent=4))


              results.json:



              {
              "title1": "These 12 Stocks Are the Best Values in 2019, According to Pros Who’ve Outsmarted the Market",
              "title2": "These US taxpayers face higher payments thanks to new law",
              "title3": "The Best Move You Can Make With Your Investments in 2019, According to 5 Market Professionals",
              "title4": "Cramer Remix: Here's where your first $10,000 should be i...",
              "title5": "The auto industry said goodbye to a lot of cars in 2018",
              "title6": "Ocado Pips Adyen to Take Crown of 2018's Best European Stock",
              "title7": "7 Stock Picks From Top-Rated Wall Street Analysts",
              "title8": "Buy IBM Stock as It Begins 2019 as the Cheapest Dow Component",
              "title9": "$70 Oil Could Be Right Around The Corner",
              "title10": "What Is the Highest Credit Score and How Do You Get It?",
              "title11": "Silver Price Forecast – Silver markets stall on New Year’s Eve",
              "title12": "This Chart Says the S&P 500 Could Rebound in 2019",
              "title13": "Should You Buy Some Berkshire Hathaway Stock?",
              "title14": "How Much Does a Financial Advisor Cost?",
              "title15": "Here Are the World's Biggest Billionaire Winners and Losers of 2018",
              "title16": "Tax tips: What you need to know before you file your taxes in 2019",
              "title17": "Kevin O’Leary: Make This Your Top New Year’s Resolution",
              "title18": "Dakota Access pipeline developer slow to replace some trees",
              "title19": "Einhorn's Greenlight Extends Decline to 34% in Worst Year",
              "title20": "4 companies to watch in 2019",
              "title21": "What Is My Debt-to-Income Ratio?",
              "title22": "US recession unlikely, market volatility to continue in 2019, El-Erian says",
              "title23": "Fidelity: Ignore stock market turbulence and stick to long-term goals",
              "title24": "Tax season: How you can come out a winner",
              "title25": "IBD 50 Growth Stocks To Watch"
              }





              share|improve this answer






























                0














                import requests
                import json
                from bs4 import BeautifulSoup
                #Getting webpage
                page = requests.get("https://finance.yahoo.com/")
                soup = BeautifulSoup(page.content, 'html.parser') #creating instance of class to parse the page
                #Getting article title
                title = soup.find_all(class_="Mb(5px)")
                desc = soup.find_all(class_="Fz(14px) Lh(19px) Fz(13px)--sm1024 Lh(17px)-- sm1024 LineClamp(3,57px) LineClamp(3,51px)--sm1024 M(0)")
                #Getting article bodies
                page2 = requests.get("https://finance.yahoo.com/news/warren-buffett-suggests-read-19th-204800450.html")
                soup2 = BeautifulSoup(page2.content, 'html.parser')
                body = soup.find_all(class_="canvas-atom canvas-text Mb(1.0em) Mb(0)--sm Mt(0.8em)--sm", id="15")
                title=[x.get_text().strip() for x in title]
                limit=len(title) #change this to 5 if you need only the first 5
                data={"title"+str(i+1):title[i] for i in range(0,limit)}
                with open("results.json", "w",encoding='utf-8') as write_file:
                write_file.write(json.dumps(data, ensure_ascii=False,indent=4))


                results.json:



                {
                "title1": "These 12 Stocks Are the Best Values in 2019, According to Pros Who’ve Outsmarted the Market",
                "title2": "These US taxpayers face higher payments thanks to new law",
                "title3": "The Best Move You Can Make With Your Investments in 2019, According to 5 Market Professionals",
                "title4": "Cramer Remix: Here's where your first $10,000 should be i...",
                "title5": "The auto industry said goodbye to a lot of cars in 2018",
                "title6": "Ocado Pips Adyen to Take Crown of 2018's Best European Stock",
                "title7": "7 Stock Picks From Top-Rated Wall Street Analysts",
                "title8": "Buy IBM Stock as It Begins 2019 as the Cheapest Dow Component",
                "title9": "$70 Oil Could Be Right Around The Corner",
                "title10": "What Is the Highest Credit Score and How Do You Get It?",
                "title11": "Silver Price Forecast – Silver markets stall on New Year’s Eve",
                "title12": "This Chart Says the S&P 500 Could Rebound in 2019",
                "title13": "Should You Buy Some Berkshire Hathaway Stock?",
                "title14": "How Much Does a Financial Advisor Cost?",
                "title15": "Here Are the World's Biggest Billionaire Winners and Losers of 2018",
                "title16": "Tax tips: What you need to know before you file your taxes in 2019",
                "title17": "Kevin O’Leary: Make This Your Top New Year’s Resolution",
                "title18": "Dakota Access pipeline developer slow to replace some trees",
                "title19": "Einhorn's Greenlight Extends Decline to 34% in Worst Year",
                "title20": "4 companies to watch in 2019",
                "title21": "What Is My Debt-to-Income Ratio?",
                "title22": "US recession unlikely, market volatility to continue in 2019, El-Erian says",
                "title23": "Fidelity: Ignore stock market turbulence and stick to long-term goals",
                "title24": "Tax season: How you can come out a winner",
                "title25": "IBD 50 Growth Stocks To Watch"
                }





                share|improve this answer




























                  0












                  0








                  0







                  import requests
                  import json
                  from bs4 import BeautifulSoup
                  #Getting webpage
                  page = requests.get("https://finance.yahoo.com/")
                  soup = BeautifulSoup(page.content, 'html.parser') #creating instance of class to parse the page
                  #Getting article title
                  title = soup.find_all(class_="Mb(5px)")
                  desc = soup.find_all(class_="Fz(14px) Lh(19px) Fz(13px)--sm1024 Lh(17px)-- sm1024 LineClamp(3,57px) LineClamp(3,51px)--sm1024 M(0)")
                  #Getting article bodies
                  page2 = requests.get("https://finance.yahoo.com/news/warren-buffett-suggests-read-19th-204800450.html")
                  soup2 = BeautifulSoup(page2.content, 'html.parser')
                  body = soup.find_all(class_="canvas-atom canvas-text Mb(1.0em) Mb(0)--sm Mt(0.8em)--sm", id="15")
                  title=[x.get_text().strip() for x in title]
                  limit=len(title) #change this to 5 if you need only the first 5
                  data={"title"+str(i+1):title[i] for i in range(0,limit)}
                  with open("results.json", "w",encoding='utf-8') as write_file:
                  write_file.write(json.dumps(data, ensure_ascii=False,indent=4))


                  results.json:



                  {
                  "title1": "These 12 Stocks Are the Best Values in 2019, According to Pros Who’ve Outsmarted the Market",
                  "title2": "These US taxpayers face higher payments thanks to new law",
                  "title3": "The Best Move You Can Make With Your Investments in 2019, According to 5 Market Professionals",
                  "title4": "Cramer Remix: Here's where your first $10,000 should be i...",
                  "title5": "The auto industry said goodbye to a lot of cars in 2018",
                  "title6": "Ocado Pips Adyen to Take Crown of 2018's Best European Stock",
                  "title7": "7 Stock Picks From Top-Rated Wall Street Analysts",
                  "title8": "Buy IBM Stock as It Begins 2019 as the Cheapest Dow Component",
                  "title9": "$70 Oil Could Be Right Around The Corner",
                  "title10": "What Is the Highest Credit Score and How Do You Get It?",
                  "title11": "Silver Price Forecast – Silver markets stall on New Year’s Eve",
                  "title12": "This Chart Says the S&P 500 Could Rebound in 2019",
                  "title13": "Should You Buy Some Berkshire Hathaway Stock?",
                  "title14": "How Much Does a Financial Advisor Cost?",
                  "title15": "Here Are the World's Biggest Billionaire Winners and Losers of 2018",
                  "title16": "Tax tips: What you need to know before you file your taxes in 2019",
                  "title17": "Kevin O’Leary: Make This Your Top New Year’s Resolution",
                  "title18": "Dakota Access pipeline developer slow to replace some trees",
                  "title19": "Einhorn's Greenlight Extends Decline to 34% in Worst Year",
                  "title20": "4 companies to watch in 2019",
                  "title21": "What Is My Debt-to-Income Ratio?",
                  "title22": "US recession unlikely, market volatility to continue in 2019, El-Erian says",
                  "title23": "Fidelity: Ignore stock market turbulence and stick to long-term goals",
                  "title24": "Tax season: How you can come out a winner",
                  "title25": "IBD 50 Growth Stocks To Watch"
                  }





                  share|improve this answer















                  import requests
                  import json
                  from bs4 import BeautifulSoup
                  #Getting webpage
                  page = requests.get("https://finance.yahoo.com/")
                  soup = BeautifulSoup(page.content, 'html.parser') #creating instance of class to parse the page
                  #Getting article title
                  title = soup.find_all(class_="Mb(5px)")
                  desc = soup.find_all(class_="Fz(14px) Lh(19px) Fz(13px)--sm1024 Lh(17px)-- sm1024 LineClamp(3,57px) LineClamp(3,51px)--sm1024 M(0)")
                  #Getting article bodies
                  page2 = requests.get("https://finance.yahoo.com/news/warren-buffett-suggests-read-19th-204800450.html")
                  soup2 = BeautifulSoup(page2.content, 'html.parser')
                  body = soup.find_all(class_="canvas-atom canvas-text Mb(1.0em) Mb(0)--sm Mt(0.8em)--sm", id="15")
                  title=[x.get_text().strip() for x in title]
                  limit=len(title) #change this to 5 if you need only the first 5
                  data={"title"+str(i+1):title[i] for i in range(0,limit)}
                  with open("results.json", "w",encoding='utf-8') as write_file:
                  write_file.write(json.dumps(data, ensure_ascii=False,indent=4))


                  results.json:



                  {
                  "title1": "These 12 Stocks Are the Best Values in 2019, According to Pros Who’ve Outsmarted the Market",
                  "title2": "These US taxpayers face higher payments thanks to new law",
                  "title3": "The Best Move You Can Make With Your Investments in 2019, According to 5 Market Professionals",
                  "title4": "Cramer Remix: Here's where your first $10,000 should be i...",
                  "title5": "The auto industry said goodbye to a lot of cars in 2018",
                  "title6": "Ocado Pips Adyen to Take Crown of 2018's Best European Stock",
                  "title7": "7 Stock Picks From Top-Rated Wall Street Analysts",
                  "title8": "Buy IBM Stock as It Begins 2019 as the Cheapest Dow Component",
                  "title9": "$70 Oil Could Be Right Around The Corner",
                  "title10": "What Is the Highest Credit Score and How Do You Get It?",
                  "title11": "Silver Price Forecast – Silver markets stall on New Year’s Eve",
                  "title12": "This Chart Says the S&P 500 Could Rebound in 2019",
                  "title13": "Should You Buy Some Berkshire Hathaway Stock?",
                  "title14": "How Much Does a Financial Advisor Cost?",
                  "title15": "Here Are the World's Biggest Billionaire Winners and Losers of 2018",
                  "title16": "Tax tips: What you need to know before you file your taxes in 2019",
                  "title17": "Kevin O’Leary: Make This Your Top New Year’s Resolution",
                  "title18": "Dakota Access pipeline developer slow to replace some trees",
                  "title19": "Einhorn's Greenlight Extends Decline to 34% in Worst Year",
                  "title20": "4 companies to watch in 2019",
                  "title21": "What Is My Debt-to-Income Ratio?",
                  "title22": "US recession unlikely, market volatility to continue in 2019, El-Erian says",
                  "title23": "Fidelity: Ignore stock market turbulence and stick to long-term goals",
                  "title24": "Tax season: How you can come out a winner",
                  "title25": "IBD 50 Growth Stocks To Watch"
                  }






                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited Jan 1 at 19:11

























                  answered Jan 1 at 19:05









                  Bitto BennichanBitto Bennichan

                  3,4161225




                  3,4161225






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53997705%2fhow-to-organize-data-in-a-json-file-created-through-webscraping%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      MongoDB - Not Authorized To Execute Command

                      How to fix TextFormField cause rebuild widget in Flutter

                      Npm cannot find a required file even through it is in the searched directory