Complete HTML doesn't render when scraping using bs4 python

I am trying to scrape data from geeksforgeeks for my own simple scraping and analysis project.

I am using bs4 and requests - python2

I need to scrape all the questions on this url so I do,

ques_page = requests.get('https://practice.geeksforgeeks.org/explore/?page=1')

ques_soup = BeautifulSoup(ques_page.text, 'lxml')

get_ques = ques_soup.find('div', class_="panel problem-block")

The class panel problem-block contians the question data.

But when I view the scraped html - print(ques_page.text) doesn't contain the div at all !

On viewing the page source ( Ctrl-F for problemFeed this div is where all the questions are present )

<div id="problemFeed" class="row" data-masonry-options='{"itemSelector": ".item" }'></div>

This div is EMPTY! Thus am not able to scrape any data out of it! How is this possible, since I can view everything inside this div in the console, but not in the page source or during scraping!

asked Jan 2 at 11:16

Gagan Ganapathy

It is possible that this part is rendered after the page is up (by javascript) thus its not part of the original html

– Ron Serruya
Jan 2 at 11:20

if you open this page in a browser like chrome and select "view page source" you will see this class "panel problem-block" doesn't exist either.

– Chris Doyle
Jan 2 at 11:31

Yes, this class is inside the problemFeed div itself thats why @ChrisDoyle

– Gagan Ganapathy
Jan 2 at 11:32

@RonSerruya so such things are not scrap-able at all ?

– Gagan Ganapathy
Jan 2 at 11:32

You can scrape the rendered HTML using selenium

– Sreyas
Jan 2 at 11:35

|
show 3 more comments

I am trying to scrape data from geeksforgeeks for my own simple scraping and analysis project.

I am using bs4 and requests - python2

I need to scrape all the questions on this url so I do,

ques_page = requests.get('https://practice.geeksforgeeks.org/explore/?page=1')

ques_soup = BeautifulSoup(ques_page.text, 'lxml')

get_ques = ques_soup.find('div', class_="panel problem-block")

The class panel problem-block contians the question data.

But when I view the scraped html - print(ques_page.text) doesn't contain the div at all !

On viewing the page source ( Ctrl-F for problemFeed this div is where all the questions are present )

<div id="problemFeed" class="row" data-masonry-options='{"itemSelector": ".item" }'></div>

This div is EMPTY! Thus am not able to scrape any data out of it! How is this possible, since I can view everything inside this div in the console, but not in the page source or during scraping!

asked Jan 2 at 11:16

Gagan Ganapathy

It is possible that this part is rendered after the page is up (by javascript) thus its not part of the original html

– Ron Serruya
Jan 2 at 11:20

if you open this page in a browser like chrome and select "view page source" you will see this class "panel problem-block" doesn't exist either.

– Chris Doyle
Jan 2 at 11:31

Yes, this class is inside the problemFeed div itself thats why @ChrisDoyle

– Gagan Ganapathy
Jan 2 at 11:32

@RonSerruya so such things are not scrap-able at all ?

– Gagan Ganapathy
Jan 2 at 11:32

You can scrape the rendered HTML using selenium

– Sreyas
Jan 2 at 11:35

|
show 3 more comments

I am trying to scrape data from geeksforgeeks for my own simple scraping and analysis project.

I am using bs4 and requests - python2

I need to scrape all the questions on this url so I do,

ques_page = requests.get('https://practice.geeksforgeeks.org/explore/?page=1')

ques_soup = BeautifulSoup(ques_page.text, 'lxml')

get_ques = ques_soup.find('div', class_="panel problem-block")

The class panel problem-block contians the question data.

But when I view the scraped html - print(ques_page.text) doesn't contain the div at all !

On viewing the page source ( Ctrl-F for problemFeed this div is where all the questions are present )

<div id="problemFeed" class="row" data-masonry-options='{"itemSelector": ".item" }'></div>

This div is EMPTY! Thus am not able to scrape any data out of it! How is this possible, since I can view everything inside this div in the console, but not in the page source or during scraping!

asked Jan 2 at 11:16

Gagan Ganapathy

I am trying to scrape data from geeksforgeeks for my own simple scraping and analysis project.

I am using bs4 and requests - python2

I need to scrape all the questions on this url so I do,

ques_page = requests.get('https://practice.geeksforgeeks.org/explore/?page=1')

ques_soup = BeautifulSoup(ques_page.text, 'lxml')

get_ques = ques_soup.find('div', class_="panel problem-block")

The class panel problem-block contians the question data.

But when I view the scraped html - print(ques_page.text) doesn't contain the div at all !

On viewing the page source ( Ctrl-F for problemFeed this div is where all the questions are present )

<div id="problemFeed" class="row" data-masonry-options='{"itemSelector": ".item" }'></div>

This div is EMPTY! Thus am not able to scrape any data out of it! How is this possible, since I can view everything inside this div in the console, but not in the page source or during scraping!

python html web-scraping beautifulsoup

asked Jan 2 at 11:16

Gagan Ganapathy

asked Jan 2 at 11:16

Gagan Ganapathy

asked Jan 2 at 11:16

Gagan Ganapathy

asked Jan 2 at 11:16

Gagan Ganapathy

asked Jan 2 at 11:16

Gagan Ganapathy

It is possible that this part is rendered after the page is up (by javascript) thus its not part of the original html

– Ron Serruya
Jan 2 at 11:20

if you open this page in a browser like chrome and select "view page source" you will see this class "panel problem-block" doesn't exist either.

– Chris Doyle
Jan 2 at 11:31

Yes, this class is inside the problemFeed div itself thats why @ChrisDoyle

– Gagan Ganapathy
Jan 2 at 11:32

@RonSerruya so such things are not scrap-able at all ?

– Gagan Ganapathy
Jan 2 at 11:32

You can scrape the rendered HTML using selenium

– Sreyas
Jan 2 at 11:35

|
show 3 more comments

It is possible that this part is rendered after the page is up (by javascript) thus its not part of the original html

– Ron Serruya
Jan 2 at 11:20

if you open this page in a browser like chrome and select "view page source" you will see this class "panel problem-block" doesn't exist either.

– Chris Doyle
Jan 2 at 11:31

Yes, this class is inside the problemFeed div itself thats why @ChrisDoyle

– Gagan Ganapathy
Jan 2 at 11:32

@RonSerruya so such things are not scrap-able at all ?

– Gagan Ganapathy
Jan 2 at 11:32

You can scrape the rendered HTML using selenium

– Sreyas
Jan 2 at 11:35

It is possible that this part is rendered after the page is up (by javascript) thus its not part of the original html

– Ron Serruya
Jan 2 at 11:20

if you open this page in a browser like chrome and select "view page source" you will see this class "panel problem-block" doesn't exist either.

– Chris Doyle
Jan 2 at 11:31

Yes, this class is inside the problemFeed div itself thats why @ChrisDoyle

– Gagan Ganapathy
Jan 2 at 11:32

@RonSerruya so such things are not scrap-able at all ?

– Gagan Ganapathy
Jan 2 at 11:32

You can scrape the rendered HTML using selenium

– Sreyas
Jan 2 at 11:35

|
show 3 more comments

1 Answer
1

active

oldest

votes

you can get it from the Ajax endpoint with post request

data = {'page': 1, 'query' : 'page1'} # 2, page2...

ques_page = requests.post('https://practice.geeksforgeeks.org/ajax/practicePageAjax.php', data=data)

ques_soup = BeautifulSoup(ques_page.text, 'lxml')

get_ques = ques_soup.find('div', class_="panel problem-block")

print(get_ques)

answered Jan 2 at 11:51

ewwink

12.2k22440

How did you find the ajax endpoint ?

– Gagan Ganapathy
Jan 2 at 13:13

you can view it on browser console

– ewwink
Jan 3 at 2:12

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54005341%2fcomplete-html-doesnt-render-when-scraping-using-bs4-python%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

you can get it from the Ajax endpoint with post request

data = {'page': 1, 'query' : 'page1'} # 2, page2...

ques_page = requests.post('https://practice.geeksforgeeks.org/ajax/practicePageAjax.php', data=data)

ques_soup = BeautifulSoup(ques_page.text, 'lxml')

get_ques = ques_soup.find('div', class_="panel problem-block")

print(get_ques)

answered Jan 2 at 11:51

ewwink

12.2k22440

How did you find the ajax endpoint ?

– Gagan Ganapathy
Jan 2 at 13:13

you can view it on browser console

– ewwink
Jan 3 at 2:12

add a comment |

you can get it from the Ajax endpoint with post request

data = {'page': 1, 'query' : 'page1'} # 2, page2...

ques_page = requests.post('https://practice.geeksforgeeks.org/ajax/practicePageAjax.php', data=data)

ques_soup = BeautifulSoup(ques_page.text, 'lxml')

get_ques = ques_soup.find('div', class_="panel problem-block")

print(get_ques)

answered Jan 2 at 11:51

ewwink

12.2k22440

How did you find the ajax endpoint ?

– Gagan Ganapathy
Jan 2 at 13:13

you can view it on browser console

– ewwink
Jan 3 at 2:12

add a comment |

you can get it from the Ajax endpoint with post request

data = {'page': 1, 'query' : 'page1'} # 2, page2...

ques_page = requests.post('https://practice.geeksforgeeks.org/ajax/practicePageAjax.php', data=data)

ques_soup = BeautifulSoup(ques_page.text, 'lxml')

get_ques = ques_soup.find('div', class_="panel problem-block")

print(get_ques)

answered Jan 2 at 11:51

ewwink

12.2k22440

you can get it from the Ajax endpoint with post request

data = {'page': 1, 'query' : 'page1'} # 2, page2...

ques_page = requests.post('https://practice.geeksforgeeks.org/ajax/practicePageAjax.php', data=data)

ques_soup = BeautifulSoup(ques_page.text, 'lxml')

get_ques = ques_soup.find('div', class_="panel problem-block")

print(get_ques)

answered Jan 2 at 11:51

ewwink

12.2k22440

answered Jan 2 at 11:51

ewwink

12.2k22440

answered Jan 2 at 11:51

ewwink

12.2k22440

answered Jan 2 at 11:51

ewwink

12.2k22440

How did you find the ajax endpoint ?

– Gagan Ganapathy
Jan 2 at 13:13

you can view it on browser console

– ewwink
Jan 3 at 2:12

add a comment |

How did you find the ajax endpoint ?

– Gagan Ganapathy
Jan 2 at 13:13

you can view it on browser console

– ewwink
Jan 3 at 2:12

How did you find the ajax endpoint ?

– Gagan Ganapathy
Jan 2 at 13:13

you can view it on browser console

– ewwink
Jan 3 at 2:12

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

Search This Blog

Ufyukyu