How to organize data in a json file created through webscraping

I'm trying to get article titles from yahoo news and organize it in a json file. When I dump the data to a json file it appears confusing to read. How would I go about organizing the data, either after the dump or from the beginning?

This for a web scraping project where I have to get top news articles and their bodies and export them to a json file which can then be sent to someone else's program. For now, I'm just working on getting the titles from the yahoo finance homepage.

import requests

import json

from bs4 import BeautifulSoup



#Getting webpage

page = requests.get("https://finance.yahoo.com/")

soup = BeautifulSoup(page.content, 'html.parser') #creating instance of class to parse the page

#Getting article title

title = soup.find_all(class_="Mb(5px)")

desc = soup.find_all(class_="Fz(14px) Lh(19px) Fz(13px)--sm1024 Lh(17px)--    sm1024 LineClamp(3,57px) LineClamp(3,51px)--sm1024 M(0)")

#Getting article bodies

page2 = requests.get("https://finance.yahoo.com/news/warren-buffett-suggests-read-19th-204800450.html")

soup2 = BeautifulSoup(page2.content, 'html.parser')

body = soup.find_all(class_="canvas-atom canvas-text Mb(1.0em) Mb(0)--sm 

Mt(0.8em)--sm", id="15")





#Organizing data for export

data = {'title1': title[0].get_text(),

    'title2': title[1].get_text(),

    'title3': title[2].get_text(),

    'title4': title[3].get_text(),

    'title5': title[4].get_text()}  



#Exporting the data to results.json

with open("results.json", "w") as write_file:

  json.dump(str(data), write_file)

This is what ends up being written on the json file (at the time of writing this post):

"{'title1': 'These US taxpayers face higher payments thanks to new law', 

'title2': 'These 12 Stocks Are the Best Values in 2019, According to Pros 

Whou2019ve Outsmarted the Market', '\ntitle3': 'The Best Move You Can     

Make With Your Investments in 2019, According to 5 Market Professionals', 

'title4': 'The auto industry said goodbye to a lot of cars in 2018', 

'title5': '7 Stock Picks From Top-Rated Wall Street Analysts'}"

I would like to code to show each article title on a separate line and remove the random ''s that appear in the middle.

asked Jan 1 at 18:01

Ganlas

133

JSON is relatively human-readable, but isn't a 'pretty' output format. If you want pretty output, then you need to read in the file and parse it for output, though as you say this is for import to another program, I'm not sure why you're worried about this?

– match
Jan 1 at 18:11

try json.dump(data, write_file, indent=4)

– t.m.adam
Jan 1 at 18:40

@match i mainly wanted to remove the unnecessry ''s to make it easier for the next group to analyze

– Ganlas
Jan 2 at 1:05

add a comment |

import requests

import json

from bs4 import BeautifulSoup



#Getting webpage

page = requests.get("https://finance.yahoo.com/")

soup = BeautifulSoup(page.content, 'html.parser') #creating instance of class to parse the page

#Getting article title

title = soup.find_all(class_="Mb(5px)")

desc = soup.find_all(class_="Fz(14px) Lh(19px) Fz(13px)--sm1024 Lh(17px)--    sm1024 LineClamp(3,57px) LineClamp(3,51px)--sm1024 M(0)")

#Getting article bodies

page2 = requests.get("https://finance.yahoo.com/news/warren-buffett-suggests-read-19th-204800450.html")

soup2 = BeautifulSoup(page2.content, 'html.parser')

body = soup.find_all(class_="canvas-atom canvas-text Mb(1.0em) Mb(0)--sm 

Mt(0.8em)--sm", id="15")





#Organizing data for export

data = {'title1': title[0].get_text(),

    'title2': title[1].get_text(),

    'title3': title[2].get_text(),

    'title4': title[3].get_text(),

    'title5': title[4].get_text()}  



#Exporting the data to results.json

with open("results.json", "w") as write_file:

  json.dump(str(data), write_file)

This is what ends up being written on the json file (at the time of writing this post):

"{'title1': 'These US taxpayers face higher payments thanks to new law', 

'title2': 'These 12 Stocks Are the Best Values in 2019, According to Pros 

Whou2019ve Outsmarted the Market', '\ntitle3': 'The Best Move You Can     

Make With Your Investments in 2019, According to 5 Market Professionals', 

'title4': 'The auto industry said goodbye to a lot of cars in 2018', 

'title5': '7 Stock Picks From Top-Rated Wall Street Analysts'}"

I would like to code to show each article title on a separate line and remove the random ''s that appear in the middle.

asked Jan 1 at 18:01

Ganlas

133

JSON is relatively human-readable, but isn't a 'pretty' output format. If you want pretty output, then you need to read in the file and parse it for output, though as you say this is for import to another program, I'm not sure why you're worried about this?

– match
Jan 1 at 18:11

try json.dump(data, write_file, indent=4)

– t.m.adam
Jan 1 at 18:40

@match i mainly wanted to remove the unnecessry ''s to make it easier for the next group to analyze

– Ganlas
Jan 2 at 1:05

add a comment |

import requests

import json

from bs4 import BeautifulSoup



#Getting webpage

page = requests.get("https://finance.yahoo.com/")

soup = BeautifulSoup(page.content, 'html.parser') #creating instance of class to parse the page

#Getting article title

title = soup.find_all(class_="Mb(5px)")

desc = soup.find_all(class_="Fz(14px) Lh(19px) Fz(13px)--sm1024 Lh(17px)--    sm1024 LineClamp(3,57px) LineClamp(3,51px)--sm1024 M(0)")

#Getting article bodies

page2 = requests.get("https://finance.yahoo.com/news/warren-buffett-suggests-read-19th-204800450.html")

soup2 = BeautifulSoup(page2.content, 'html.parser')

body = soup.find_all(class_="canvas-atom canvas-text Mb(1.0em) Mb(0)--sm 

Mt(0.8em)--sm", id="15")





#Organizing data for export

data = {'title1': title[0].get_text(),

    'title2': title[1].get_text(),

    'title3': title[2].get_text(),

    'title4': title[3].get_text(),

    'title5': title[4].get_text()}  



#Exporting the data to results.json

with open("results.json", "w") as write_file:

  json.dump(str(data), write_file)

This is what ends up being written on the json file (at the time of writing this post):

"{'title1': 'These US taxpayers face higher payments thanks to new law', 

'title2': 'These 12 Stocks Are the Best Values in 2019, According to Pros 

Whou2019ve Outsmarted the Market', '\ntitle3': 'The Best Move You Can     

Make With Your Investments in 2019, According to 5 Market Professionals', 

'title4': 'The auto industry said goodbye to a lot of cars in 2018', 

'title5': '7 Stock Picks From Top-Rated Wall Street Analysts'}"

I would like to code to show each article title on a separate line and remove the random ''s that appear in the middle.

asked Jan 1 at 18:01

Ganlas

133

import requests

import json

from bs4 import BeautifulSoup



#Getting webpage

page = requests.get("https://finance.yahoo.com/")

soup = BeautifulSoup(page.content, 'html.parser') #creating instance of class to parse the page

#Getting article title

title = soup.find_all(class_="Mb(5px)")

desc = soup.find_all(class_="Fz(14px) Lh(19px) Fz(13px)--sm1024 Lh(17px)--    sm1024 LineClamp(3,57px) LineClamp(3,51px)--sm1024 M(0)")

#Getting article bodies

page2 = requests.get("https://finance.yahoo.com/news/warren-buffett-suggests-read-19th-204800450.html")

soup2 = BeautifulSoup(page2.content, 'html.parser')

body = soup.find_all(class_="canvas-atom canvas-text Mb(1.0em) Mb(0)--sm 

Mt(0.8em)--sm", id="15")





#Organizing data for export

data = {'title1': title[0].get_text(),

    'title2': title[1].get_text(),

    'title3': title[2].get_text(),

    'title4': title[3].get_text(),

    'title5': title[4].get_text()}  



#Exporting the data to results.json

with open("results.json", "w") as write_file:

  json.dump(str(data), write_file)

This is what ends up being written on the json file (at the time of writing this post):

"{'title1': 'These US taxpayers face higher payments thanks to new law', 

'title2': 'These 12 Stocks Are the Best Values in 2019, According to Pros 

Whou2019ve Outsmarted the Market', '\ntitle3': 'The Best Move You Can     

Make With Your Investments in 2019, According to 5 Market Professionals', 

'title4': 'The auto industry said goodbye to a lot of cars in 2018', 

'title5': '7 Stock Picks From Top-Rated Wall Street Analysts'}"

I would like to code to show each article title on a separate line and remove the random ''s that appear in the middle.

python json beautifulsoup repl.it

asked Jan 1 at 18:01

Ganlas

133

asked Jan 1 at 18:01

Ganlas

133

asked Jan 1 at 18:01

Ganlas

133

asked Jan 1 at 18:01

Ganlas

133

asked Jan 1 at 18:01

Ganlas

133

JSON is relatively human-readable, but isn't a 'pretty' output format. If you want pretty output, then you need to read in the file and parse it for output, though as you say this is for import to another program, I'm not sure why you're worried about this?

– match
Jan 1 at 18:11

try json.dump(data, write_file, indent=4)

– t.m.adam
Jan 1 at 18:40

@match i mainly wanted to remove the unnecessry ''s to make it easier for the next group to analyze

– Ganlas
Jan 2 at 1:05

add a comment |

JSON is relatively human-readable, but isn't a 'pretty' output format. If you want pretty output, then you need to read in the file and parse it for output, though as you say this is for import to another program, I'm not sure why you're worried about this?

– match
Jan 1 at 18:11

try json.dump(data, write_file, indent=4)

– t.m.adam
Jan 1 at 18:40

@match i mainly wanted to remove the unnecessry ''s to make it easier for the next group to analyze

– Ganlas
Jan 2 at 1:05

JSON is relatively human-readable, but isn't a 'pretty' output format. If you want pretty output, then you need to read in the file and parse it for output, though as you say this is for import to another program, I'm not sure why you're worried about this?

– match
Jan 1 at 18:11

try json.dump(data, write_file, indent=4)

– t.m.adam
Jan 1 at 18:40

@match i mainly wanted to remove the unnecessry ''s to make it easier for the next group to analyze

– Ganlas
Jan 2 at 1:05

add a comment |

2 Answers
2

active

oldest

votes

I have run your code but I didn't get any result like that you got. You have defined 'title3' which is a constant, but you got 'n' which I didn't get actually in my case. By the way, you were getting /'s because you didn't encoded it correctly like 'utf8' and ascii ensure set to false. I would suggest two change like - 'lxml' parser not 'html.parser' and this code snippet:

with open("results.json", "w",encoding='utf8') as write_file:

    json.dump(str(data), write_file ,ensure_ascii=False)

this totally worked for me /'s exclusion and ascii issues solved as well.

answered Jan 1 at 18:56

Mobasshir Bhuiyan

338

add a comment |

import requests

import json

from bs4 import BeautifulSoup

#Getting webpage

page = requests.get("https://finance.yahoo.com/")

soup = BeautifulSoup(page.content, 'html.parser') #creating instance of class to parse the page

#Getting article title

title = soup.find_all(class_="Mb(5px)")

desc = soup.find_all(class_="Fz(14px) Lh(19px) Fz(13px)--sm1024 Lh(17px)--    sm1024 LineClamp(3,57px) LineClamp(3,51px)--sm1024 M(0)")

#Getting article bodies

page2 = requests.get("https://finance.yahoo.com/news/warren-buffett-suggests-read-19th-204800450.html")

soup2 = BeautifulSoup(page2.content, 'html.parser')

body = soup.find_all(class_="canvas-atom canvas-text Mb(1.0em) Mb(0)--sm Mt(0.8em)--sm", id="15")

title=[x.get_text().strip() for x in title]

limit=len(title) #change this to 5 if you need only the first 5

data={"title"+str(i+1):title[i] for i in range(0,limit)}

with open("results.json", "w",encoding='utf-8') as write_file:

        write_file.write(json.dumps(data, ensure_ascii=False,indent=4))

results.json:

{

    "title1": "These 12 Stocks Are the Best Values in 2019, According to Pros Who’ve Outsmarted the Market",

    "title2": "These US taxpayers face higher payments thanks to new law",

    "title3": "The Best Move You Can Make With Your Investments in 2019, According to 5 Market Professionals",

    "title4": "Cramer Remix: Here's where your first $10,000 should be i...",

    "title5": "The auto industry said goodbye to a lot of cars in 2018",

    "title6": "Ocado Pips Adyen to Take Crown of 2018's Best European Stock",

    "title7": "7 Stock Picks From Top-Rated Wall Street Analysts",

    "title8": "Buy IBM Stock as It Begins 2019 as the Cheapest Dow Component",

    "title9": "$70 Oil Could Be Right Around The Corner",

    "title10": "What Is the Highest Credit Score and How Do You Get It?",

    "title11": "Silver Price Forecast – Silver markets stall on New Year’s Eve",

    "title12": "This Chart Says the S&P 500 Could Rebound in 2019",

    "title13": "Should You Buy Some Berkshire Hathaway Stock?",

    "title14": "How Much Does a Financial Advisor Cost?",

    "title15": "Here Are the World's Biggest Billionaire Winners and Losers of 2018",

    "title16": "Tax tips: What you need to know before you file your taxes in 2019",

    "title17": "Kevin O’Leary: Make This Your Top New Year’s Resolution",

    "title18": "Dakota Access pipeline developer slow to replace some trees",

    "title19": "Einhorn's Greenlight Extends Decline to 34% in Worst Year",

    "title20": "4 companies to watch in 2019",

    "title21": "What Is My Debt-to-Income Ratio?",

    "title22": "US recession unlikely, market volatility to continue in 2019, El-Erian says",

    "title23": "Fidelity: Ignore stock market turbulence and stick to long-term goals",

    "title24": "Tax season: How you can come out a winner",

    "title25": "IBD 50 Growth Stocks To Watch"

}

edited Jan 1 at 19:11

answered Jan 1 at 19:05

Bitto Bennichan

3,4161225

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53997705%2fhow-to-organize-data-in-a-json-file-created-through-webscraping%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

with open("results.json", "w",encoding='utf8') as write_file:

    json.dump(str(data), write_file ,ensure_ascii=False)

this totally worked for me /'s exclusion and ascii issues solved as well.

answered Jan 1 at 18:56

Mobasshir Bhuiyan

338

add a comment |

with open("results.json", "w",encoding='utf8') as write_file:

    json.dump(str(data), write_file ,ensure_ascii=False)

this totally worked for me /'s exclusion and ascii issues solved as well.

answered Jan 1 at 18:56

Mobasshir Bhuiyan

338

add a comment |

with open("results.json", "w",encoding='utf8') as write_file:

    json.dump(str(data), write_file ,ensure_ascii=False)

this totally worked for me /'s exclusion and ascii issues solved as well.

answered Jan 1 at 18:56

Mobasshir Bhuiyan

338

with open("results.json", "w",encoding='utf8') as write_file:

    json.dump(str(data), write_file ,ensure_ascii=False)

this totally worked for me /'s exclusion and ascii issues solved as well.

answered Jan 1 at 18:56

Mobasshir Bhuiyan

338

answered Jan 1 at 18:56

Mobasshir Bhuiyan

338

answered Jan 1 at 18:56

Mobasshir Bhuiyan

338

answered Jan 1 at 18:56

Mobasshir Bhuiyan

338

add a comment |

import requests

import json

from bs4 import BeautifulSoup

#Getting webpage

page = requests.get("https://finance.yahoo.com/")

soup = BeautifulSoup(page.content, 'html.parser') #creating instance of class to parse the page

#Getting article title

title = soup.find_all(class_="Mb(5px)")

desc = soup.find_all(class_="Fz(14px) Lh(19px) Fz(13px)--sm1024 Lh(17px)--    sm1024 LineClamp(3,57px) LineClamp(3,51px)--sm1024 M(0)")

#Getting article bodies

page2 = requests.get("https://finance.yahoo.com/news/warren-buffett-suggests-read-19th-204800450.html")

soup2 = BeautifulSoup(page2.content, 'html.parser')

body = soup.find_all(class_="canvas-atom canvas-text Mb(1.0em) Mb(0)--sm Mt(0.8em)--sm", id="15")

title=[x.get_text().strip() for x in title]

limit=len(title) #change this to 5 if you need only the first 5

data={"title"+str(i+1):title[i] for i in range(0,limit)}

with open("results.json", "w",encoding='utf-8') as write_file:

        write_file.write(json.dumps(data, ensure_ascii=False,indent=4))

results.json:

{

    "title1": "These 12 Stocks Are the Best Values in 2019, According to Pros Who’ve Outsmarted the Market",

    "title2": "These US taxpayers face higher payments thanks to new law",

    "title3": "The Best Move You Can Make With Your Investments in 2019, According to 5 Market Professionals",

    "title4": "Cramer Remix: Here's where your first $10,000 should be i...",

    "title5": "The auto industry said goodbye to a lot of cars in 2018",

    "title6": "Ocado Pips Adyen to Take Crown of 2018's Best European Stock",

    "title7": "7 Stock Picks From Top-Rated Wall Street Analysts",

    "title8": "Buy IBM Stock as It Begins 2019 as the Cheapest Dow Component",

    "title9": "$70 Oil Could Be Right Around The Corner",

    "title10": "What Is the Highest Credit Score and How Do You Get It?",

    "title11": "Silver Price Forecast – Silver markets stall on New Year’s Eve",

    "title12": "This Chart Says the S&P 500 Could Rebound in 2019",

    "title13": "Should You Buy Some Berkshire Hathaway Stock?",

    "title14": "How Much Does a Financial Advisor Cost?",

    "title15": "Here Are the World's Biggest Billionaire Winners and Losers of 2018",

    "title16": "Tax tips: What you need to know before you file your taxes in 2019",

    "title17": "Kevin O’Leary: Make This Your Top New Year’s Resolution",

    "title18": "Dakota Access pipeline developer slow to replace some trees",

    "title19": "Einhorn's Greenlight Extends Decline to 34% in Worst Year",

    "title20": "4 companies to watch in 2019",

    "title21": "What Is My Debt-to-Income Ratio?",

    "title22": "US recession unlikely, market volatility to continue in 2019, El-Erian says",

    "title23": "Fidelity: Ignore stock market turbulence and stick to long-term goals",

    "title24": "Tax season: How you can come out a winner",

    "title25": "IBD 50 Growth Stocks To Watch"

}

edited Jan 1 at 19:11

answered Jan 1 at 19:05

Bitto Bennichan

3,4161225

add a comment |

import requests

import json

from bs4 import BeautifulSoup

#Getting webpage

page = requests.get("https://finance.yahoo.com/")

soup = BeautifulSoup(page.content, 'html.parser') #creating instance of class to parse the page

#Getting article title

title = soup.find_all(class_="Mb(5px)")

desc = soup.find_all(class_="Fz(14px) Lh(19px) Fz(13px)--sm1024 Lh(17px)--    sm1024 LineClamp(3,57px) LineClamp(3,51px)--sm1024 M(0)")

#Getting article bodies

page2 = requests.get("https://finance.yahoo.com/news/warren-buffett-suggests-read-19th-204800450.html")

soup2 = BeautifulSoup(page2.content, 'html.parser')

body = soup.find_all(class_="canvas-atom canvas-text Mb(1.0em) Mb(0)--sm Mt(0.8em)--sm", id="15")

title=[x.get_text().strip() for x in title]

limit=len(title) #change this to 5 if you need only the first 5

data={"title"+str(i+1):title[i] for i in range(0,limit)}

with open("results.json", "w",encoding='utf-8') as write_file:

        write_file.write(json.dumps(data, ensure_ascii=False,indent=4))

results.json:

{

    "title1": "These 12 Stocks Are the Best Values in 2019, According to Pros Who’ve Outsmarted the Market",

    "title2": "These US taxpayers face higher payments thanks to new law",

    "title3": "The Best Move You Can Make With Your Investments in 2019, According to 5 Market Professionals",

    "title4": "Cramer Remix: Here's where your first $10,000 should be i...",

    "title5": "The auto industry said goodbye to a lot of cars in 2018",

    "title6": "Ocado Pips Adyen to Take Crown of 2018's Best European Stock",

    "title7": "7 Stock Picks From Top-Rated Wall Street Analysts",

    "title8": "Buy IBM Stock as It Begins 2019 as the Cheapest Dow Component",

    "title9": "$70 Oil Could Be Right Around The Corner",

    "title10": "What Is the Highest Credit Score and How Do You Get It?",

    "title11": "Silver Price Forecast – Silver markets stall on New Year’s Eve",

    "title12": "This Chart Says the S&P 500 Could Rebound in 2019",

    "title13": "Should You Buy Some Berkshire Hathaway Stock?",

    "title14": "How Much Does a Financial Advisor Cost?",

    "title15": "Here Are the World's Biggest Billionaire Winners and Losers of 2018",

    "title16": "Tax tips: What you need to know before you file your taxes in 2019",

    "title17": "Kevin O’Leary: Make This Your Top New Year’s Resolution",

    "title18": "Dakota Access pipeline developer slow to replace some trees",

    "title19": "Einhorn's Greenlight Extends Decline to 34% in Worst Year",

    "title20": "4 companies to watch in 2019",

    "title21": "What Is My Debt-to-Income Ratio?",

    "title22": "US recession unlikely, market volatility to continue in 2019, El-Erian says",

    "title23": "Fidelity: Ignore stock market turbulence and stick to long-term goals",

    "title24": "Tax season: How you can come out a winner",

    "title25": "IBD 50 Growth Stocks To Watch"

}

edited Jan 1 at 19:11

answered Jan 1 at 19:05

Bitto Bennichan

3,4161225

add a comment |

import requests

import json

from bs4 import BeautifulSoup

#Getting webpage

page = requests.get("https://finance.yahoo.com/")

soup = BeautifulSoup(page.content, 'html.parser') #creating instance of class to parse the page

#Getting article title

title = soup.find_all(class_="Mb(5px)")

desc = soup.find_all(class_="Fz(14px) Lh(19px) Fz(13px)--sm1024 Lh(17px)--    sm1024 LineClamp(3,57px) LineClamp(3,51px)--sm1024 M(0)")

#Getting article bodies

page2 = requests.get("https://finance.yahoo.com/news/warren-buffett-suggests-read-19th-204800450.html")

soup2 = BeautifulSoup(page2.content, 'html.parser')

body = soup.find_all(class_="canvas-atom canvas-text Mb(1.0em) Mb(0)--sm Mt(0.8em)--sm", id="15")

title=[x.get_text().strip() for x in title]

limit=len(title) #change this to 5 if you need only the first 5

data={"title"+str(i+1):title[i] for i in range(0,limit)}

with open("results.json", "w",encoding='utf-8') as write_file:

        write_file.write(json.dumps(data, ensure_ascii=False,indent=4))

results.json:

{

    "title1": "These 12 Stocks Are the Best Values in 2019, According to Pros Who’ve Outsmarted the Market",

    "title2": "These US taxpayers face higher payments thanks to new law",

    "title3": "The Best Move You Can Make With Your Investments in 2019, According to 5 Market Professionals",

    "title4": "Cramer Remix: Here's where your first $10,000 should be i...",

    "title5": "The auto industry said goodbye to a lot of cars in 2018",

    "title6": "Ocado Pips Adyen to Take Crown of 2018's Best European Stock",

    "title7": "7 Stock Picks From Top-Rated Wall Street Analysts",

    "title8": "Buy IBM Stock as It Begins 2019 as the Cheapest Dow Component",

    "title9": "$70 Oil Could Be Right Around The Corner",

    "title10": "What Is the Highest Credit Score and How Do You Get It?",

    "title11": "Silver Price Forecast – Silver markets stall on New Year’s Eve",

    "title12": "This Chart Says the S&P 500 Could Rebound in 2019",

    "title13": "Should You Buy Some Berkshire Hathaway Stock?",

    "title14": "How Much Does a Financial Advisor Cost?",

    "title15": "Here Are the World's Biggest Billionaire Winners and Losers of 2018",

    "title16": "Tax tips: What you need to know before you file your taxes in 2019",

    "title17": "Kevin O’Leary: Make This Your Top New Year’s Resolution",

    "title18": "Dakota Access pipeline developer slow to replace some trees",

    "title19": "Einhorn's Greenlight Extends Decline to 34% in Worst Year",

    "title20": "4 companies to watch in 2019",

    "title21": "What Is My Debt-to-Income Ratio?",

    "title22": "US recession unlikely, market volatility to continue in 2019, El-Erian says",

    "title23": "Fidelity: Ignore stock market turbulence and stick to long-term goals",

    "title24": "Tax season: How you can come out a winner",

    "title25": "IBD 50 Growth Stocks To Watch"

}

edited Jan 1 at 19:11

answered Jan 1 at 19:05

Bitto Bennichan

3,4161225

import requests

import json

from bs4 import BeautifulSoup

#Getting webpage

page = requests.get("https://finance.yahoo.com/")

soup = BeautifulSoup(page.content, 'html.parser') #creating instance of class to parse the page

#Getting article title

title = soup.find_all(class_="Mb(5px)")

desc = soup.find_all(class_="Fz(14px) Lh(19px) Fz(13px)--sm1024 Lh(17px)--    sm1024 LineClamp(3,57px) LineClamp(3,51px)--sm1024 M(0)")

#Getting article bodies

page2 = requests.get("https://finance.yahoo.com/news/warren-buffett-suggests-read-19th-204800450.html")

soup2 = BeautifulSoup(page2.content, 'html.parser')

body = soup.find_all(class_="canvas-atom canvas-text Mb(1.0em) Mb(0)--sm Mt(0.8em)--sm", id="15")

title=[x.get_text().strip() for x in title]

limit=len(title) #change this to 5 if you need only the first 5

data={"title"+str(i+1):title[i] for i in range(0,limit)}

with open("results.json", "w",encoding='utf-8') as write_file:

        write_file.write(json.dumps(data, ensure_ascii=False,indent=4))

results.json:

{

    "title1": "These 12 Stocks Are the Best Values in 2019, According to Pros Who’ve Outsmarted the Market",

    "title2": "These US taxpayers face higher payments thanks to new law",

    "title3": "The Best Move You Can Make With Your Investments in 2019, According to 5 Market Professionals",

    "title4": "Cramer Remix: Here's where your first $10,000 should be i...",

    "title5": "The auto industry said goodbye to a lot of cars in 2018",

    "title6": "Ocado Pips Adyen to Take Crown of 2018's Best European Stock",

    "title7": "7 Stock Picks From Top-Rated Wall Street Analysts",

    "title8": "Buy IBM Stock as It Begins 2019 as the Cheapest Dow Component",

    "title9": "$70 Oil Could Be Right Around The Corner",

    "title10": "What Is the Highest Credit Score and How Do You Get It?",

    "title11": "Silver Price Forecast – Silver markets stall on New Year’s Eve",

    "title12": "This Chart Says the S&P 500 Could Rebound in 2019",

    "title13": "Should You Buy Some Berkshire Hathaway Stock?",

    "title14": "How Much Does a Financial Advisor Cost?",

    "title15": "Here Are the World's Biggest Billionaire Winners and Losers of 2018",

    "title16": "Tax tips: What you need to know before you file your taxes in 2019",

    "title17": "Kevin O’Leary: Make This Your Top New Year’s Resolution",

    "title18": "Dakota Access pipeline developer slow to replace some trees",

    "title19": "Einhorn's Greenlight Extends Decline to 34% in Worst Year",

    "title20": "4 companies to watch in 2019",

    "title21": "What Is My Debt-to-Income Ratio?",

    "title22": "US recession unlikely, market volatility to continue in 2019, El-Erian says",

    "title23": "Fidelity: Ignore stock market turbulence and stick to long-term goals",

    "title24": "Tax season: How you can come out a winner",

    "title25": "IBD 50 Growth Stocks To Watch"

}

edited Jan 1 at 19:11

answered Jan 1 at 19:05

Bitto Bennichan

3,4161225

edited Jan 1 at 19:11

answered Jan 1 at 19:05

Bitto Bennichan

3,4161225

answered Jan 1 at 19:05

Bitto Bennichan

3,4161225

answered Jan 1 at 19:05

Bitto Bennichan

3,4161225

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

Search This Blog

Ufyukyu