Complete HTML doesn't render when scraping using bs4 python












0















I am trying to scrape data from geeksforgeeks for my own simple scraping and analysis project.



I am using bs4 and requests - python2



I need to scrape all the questions on this url so I do,



ques_page = requests.get('https://practice.geeksforgeeks.org/explore/?page=1')
ques_soup = BeautifulSoup(ques_page.text, 'lxml')
get_ques = ques_soup.find('div', class_="panel problem-block")


The class panel problem-block contians the question data.



But when I view the scraped html - print(ques_page.text) doesn't contain the div at all !



On viewing the page source ( Ctrl-F for problemFeed this div is where all the questions are present )



<div id="problemFeed" class="row" data-masonry-options='{"itemSelector": ".item" }'></div>



This div is EMPTY! Thus am not able to scrape any data out of it! How is this possible, since I can view everything inside this div in the console, but not in the page source or during scraping!










share|improve this question























  • It is possible that this part is rendered after the page is up (by javascript) thus its not part of the original html

    – Ron Serruya
    Jan 2 at 11:20











  • if you open this page in a browser like chrome and select "view page source" you will see this class "panel problem-block" doesn't exist either.

    – Chris Doyle
    Jan 2 at 11:31











  • Yes, this class is inside the problemFeed div itself thats why @ChrisDoyle

    – Gagan Ganapathy
    Jan 2 at 11:32











  • @RonSerruya so such things are not scrap-able at all ?

    – Gagan Ganapathy
    Jan 2 at 11:32











  • You can scrape the rendered HTML using selenium

    – Sreyas
    Jan 2 at 11:35
















0















I am trying to scrape data from geeksforgeeks for my own simple scraping and analysis project.



I am using bs4 and requests - python2



I need to scrape all the questions on this url so I do,



ques_page = requests.get('https://practice.geeksforgeeks.org/explore/?page=1')
ques_soup = BeautifulSoup(ques_page.text, 'lxml')
get_ques = ques_soup.find('div', class_="panel problem-block")


The class panel problem-block contians the question data.



But when I view the scraped html - print(ques_page.text) doesn't contain the div at all !



On viewing the page source ( Ctrl-F for problemFeed this div is where all the questions are present )



<div id="problemFeed" class="row" data-masonry-options='{"itemSelector": ".item" }'></div>



This div is EMPTY! Thus am not able to scrape any data out of it! How is this possible, since I can view everything inside this div in the console, but not in the page source or during scraping!










share|improve this question























  • It is possible that this part is rendered after the page is up (by javascript) thus its not part of the original html

    – Ron Serruya
    Jan 2 at 11:20











  • if you open this page in a browser like chrome and select "view page source" you will see this class "panel problem-block" doesn't exist either.

    – Chris Doyle
    Jan 2 at 11:31











  • Yes, this class is inside the problemFeed div itself thats why @ChrisDoyle

    – Gagan Ganapathy
    Jan 2 at 11:32











  • @RonSerruya so such things are not scrap-able at all ?

    – Gagan Ganapathy
    Jan 2 at 11:32











  • You can scrape the rendered HTML using selenium

    – Sreyas
    Jan 2 at 11:35














0












0








0








I am trying to scrape data from geeksforgeeks for my own simple scraping and analysis project.



I am using bs4 and requests - python2



I need to scrape all the questions on this url so I do,



ques_page = requests.get('https://practice.geeksforgeeks.org/explore/?page=1')
ques_soup = BeautifulSoup(ques_page.text, 'lxml')
get_ques = ques_soup.find('div', class_="panel problem-block")


The class panel problem-block contians the question data.



But when I view the scraped html - print(ques_page.text) doesn't contain the div at all !



On viewing the page source ( Ctrl-F for problemFeed this div is where all the questions are present )



<div id="problemFeed" class="row" data-masonry-options='{"itemSelector": ".item" }'></div>



This div is EMPTY! Thus am not able to scrape any data out of it! How is this possible, since I can view everything inside this div in the console, but not in the page source or during scraping!










share|improve this question














I am trying to scrape data from geeksforgeeks for my own simple scraping and analysis project.



I am using bs4 and requests - python2



I need to scrape all the questions on this url so I do,



ques_page = requests.get('https://practice.geeksforgeeks.org/explore/?page=1')
ques_soup = BeautifulSoup(ques_page.text, 'lxml')
get_ques = ques_soup.find('div', class_="panel problem-block")


The class panel problem-block contians the question data.



But when I view the scraped html - print(ques_page.text) doesn't contain the div at all !



On viewing the page source ( Ctrl-F for problemFeed this div is where all the questions are present )



<div id="problemFeed" class="row" data-masonry-options='{"itemSelector": ".item" }'></div>



This div is EMPTY! Thus am not able to scrape any data out of it! How is this possible, since I can view everything inside this div in the console, but not in the page source or during scraping!







python html web-scraping beautifulsoup






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Jan 2 at 11:16









Gagan GanapathyGagan Ganapathy

33




33













  • It is possible that this part is rendered after the page is up (by javascript) thus its not part of the original html

    – Ron Serruya
    Jan 2 at 11:20











  • if you open this page in a browser like chrome and select "view page source" you will see this class "panel problem-block" doesn't exist either.

    – Chris Doyle
    Jan 2 at 11:31











  • Yes, this class is inside the problemFeed div itself thats why @ChrisDoyle

    – Gagan Ganapathy
    Jan 2 at 11:32











  • @RonSerruya so such things are not scrap-able at all ?

    – Gagan Ganapathy
    Jan 2 at 11:32











  • You can scrape the rendered HTML using selenium

    – Sreyas
    Jan 2 at 11:35



















  • It is possible that this part is rendered after the page is up (by javascript) thus its not part of the original html

    – Ron Serruya
    Jan 2 at 11:20











  • if you open this page in a browser like chrome and select "view page source" you will see this class "panel problem-block" doesn't exist either.

    – Chris Doyle
    Jan 2 at 11:31











  • Yes, this class is inside the problemFeed div itself thats why @ChrisDoyle

    – Gagan Ganapathy
    Jan 2 at 11:32











  • @RonSerruya so such things are not scrap-able at all ?

    – Gagan Ganapathy
    Jan 2 at 11:32











  • You can scrape the rendered HTML using selenium

    – Sreyas
    Jan 2 at 11:35

















It is possible that this part is rendered after the page is up (by javascript) thus its not part of the original html

– Ron Serruya
Jan 2 at 11:20





It is possible that this part is rendered after the page is up (by javascript) thus its not part of the original html

– Ron Serruya
Jan 2 at 11:20













if you open this page in a browser like chrome and select "view page source" you will see this class "panel problem-block" doesn't exist either.

– Chris Doyle
Jan 2 at 11:31





if you open this page in a browser like chrome and select "view page source" you will see this class "panel problem-block" doesn't exist either.

– Chris Doyle
Jan 2 at 11:31













Yes, this class is inside the problemFeed div itself thats why @ChrisDoyle

– Gagan Ganapathy
Jan 2 at 11:32





Yes, this class is inside the problemFeed div itself thats why @ChrisDoyle

– Gagan Ganapathy
Jan 2 at 11:32













@RonSerruya so such things are not scrap-able at all ?

– Gagan Ganapathy
Jan 2 at 11:32





@RonSerruya so such things are not scrap-able at all ?

– Gagan Ganapathy
Jan 2 at 11:32













You can scrape the rendered HTML using selenium

– Sreyas
Jan 2 at 11:35





You can scrape the rendered HTML using selenium

– Sreyas
Jan 2 at 11:35












1 Answer
1






active

oldest

votes


















0














you can get it from the Ajax endpoint with post request



data = {'page': 1, 'query' : 'page1'} # 2, page2...
ques_page = requests.post('https://practice.geeksforgeeks.org/ajax/practicePageAjax.php', data=data)
ques_soup = BeautifulSoup(ques_page.text, 'lxml')
get_ques = ques_soup.find('div', class_="panel problem-block")
print(get_ques)





share|improve this answer
























  • How did you find the ajax endpoint ?

    – Gagan Ganapathy
    Jan 2 at 13:13











  • you can view it on browser console

    – ewwink
    Jan 3 at 2:12











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54005341%2fcomplete-html-doesnt-render-when-scraping-using-bs4-python%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









0














you can get it from the Ajax endpoint with post request



data = {'page': 1, 'query' : 'page1'} # 2, page2...
ques_page = requests.post('https://practice.geeksforgeeks.org/ajax/practicePageAjax.php', data=data)
ques_soup = BeautifulSoup(ques_page.text, 'lxml')
get_ques = ques_soup.find('div', class_="panel problem-block")
print(get_ques)





share|improve this answer
























  • How did you find the ajax endpoint ?

    – Gagan Ganapathy
    Jan 2 at 13:13











  • you can view it on browser console

    – ewwink
    Jan 3 at 2:12
















0














you can get it from the Ajax endpoint with post request



data = {'page': 1, 'query' : 'page1'} # 2, page2...
ques_page = requests.post('https://practice.geeksforgeeks.org/ajax/practicePageAjax.php', data=data)
ques_soup = BeautifulSoup(ques_page.text, 'lxml')
get_ques = ques_soup.find('div', class_="panel problem-block")
print(get_ques)





share|improve this answer
























  • How did you find the ajax endpoint ?

    – Gagan Ganapathy
    Jan 2 at 13:13











  • you can view it on browser console

    – ewwink
    Jan 3 at 2:12














0












0








0







you can get it from the Ajax endpoint with post request



data = {'page': 1, 'query' : 'page1'} # 2, page2...
ques_page = requests.post('https://practice.geeksforgeeks.org/ajax/practicePageAjax.php', data=data)
ques_soup = BeautifulSoup(ques_page.text, 'lxml')
get_ques = ques_soup.find('div', class_="panel problem-block")
print(get_ques)





share|improve this answer













you can get it from the Ajax endpoint with post request



data = {'page': 1, 'query' : 'page1'} # 2, page2...
ques_page = requests.post('https://practice.geeksforgeeks.org/ajax/practicePageAjax.php', data=data)
ques_soup = BeautifulSoup(ques_page.text, 'lxml')
get_ques = ques_soup.find('div', class_="panel problem-block")
print(get_ques)






share|improve this answer












share|improve this answer



share|improve this answer










answered Jan 2 at 11:51









ewwinkewwink

12.2k22440




12.2k22440













  • How did you find the ajax endpoint ?

    – Gagan Ganapathy
    Jan 2 at 13:13











  • you can view it on browser console

    – ewwink
    Jan 3 at 2:12



















  • How did you find the ajax endpoint ?

    – Gagan Ganapathy
    Jan 2 at 13:13











  • you can view it on browser console

    – ewwink
    Jan 3 at 2:12

















How did you find the ajax endpoint ?

– Gagan Ganapathy
Jan 2 at 13:13





How did you find the ajax endpoint ?

– Gagan Ganapathy
Jan 2 at 13:13













you can view it on browser console

– ewwink
Jan 3 at 2:12





you can view it on browser console

– ewwink
Jan 3 at 2:12




















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54005341%2fcomplete-html-doesnt-render-when-scraping-using-bs4-python%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

MongoDB - Not Authorized To Execute Command

Npm cannot find a required file even through it is in the searched directory

in spring boot 2.1 many test slices are not allowed anymore due to multiple @BootstrapWith