How to get the web page using requests.post?

I want to get the result of the web page http://www3.hkexnews.hk/listedco/listconews/advancedsearch/search_active_main.aspx with the input of stock code being 5.

The problem is that I don't know the website after pressing search as it runs a javascript.

Furthermore, how to find the parameters needed to pass to requests.post, e.g. data? Is header needed?

enter image description here

edited Nov 22 '18 at 11:27

petezurich

3,65081834

asked Nov 22 '18 at 10:45

Chan

448215

Do you want to simulate POST request that sends after you enter 5 to "Stock code" field and press "Search"?

– Andersson
Nov 22 '18 at 10:48

Yes, you are right.

– Chan
Nov 22 '18 at 10:51

looks site has problems when clicking search

– mirhossein
Nov 22 '18 at 17:16

The website works. After inputting 5 in the field of stock code, then press search, you can view the page that shows the results.

– Chan
Nov 23 '18 at 1:09

Can anyone help?

– Chan
Nov 23 '18 at 3:46

|
show 1 more comment

I want to get the result of the web page http://www3.hkexnews.hk/listedco/listconews/advancedsearch/search_active_main.aspx with the input of stock code being 5.

The problem is that I don't know the website after pressing search as it runs a javascript.

Furthermore, how to find the parameters needed to pass to requests.post, e.g. data? Is header needed?

enter image description here

edited Nov 22 '18 at 11:27

petezurich

3,65081834

asked Nov 22 '18 at 10:45

Chan

448215

Do you want to simulate POST request that sends after you enter 5 to "Stock code" field and press "Search"?

– Andersson
Nov 22 '18 at 10:48

Yes, you are right.

– Chan
Nov 22 '18 at 10:51

looks site has problems when clicking search

– mirhossein
Nov 22 '18 at 17:16

The website works. After inputting 5 in the field of stock code, then press search, you can view the page that shows the results.

– Chan
Nov 23 '18 at 1:09

Can anyone help?

– Chan
Nov 23 '18 at 3:46

|
show 1 more comment

I want to get the result of the web page http://www3.hkexnews.hk/listedco/listconews/advancedsearch/search_active_main.aspx with the input of stock code being 5.

The problem is that I don't know the website after pressing search as it runs a javascript.

Furthermore, how to find the parameters needed to pass to requests.post, e.g. data? Is header needed?

enter image description here

edited Nov 22 '18 at 11:27

petezurich

3,65081834

asked Nov 22 '18 at 10:45

Chan

448215

I want to get the result of the web page http://www3.hkexnews.hk/listedco/listconews/advancedsearch/search_active_main.aspx with the input of stock code being 5.

The problem is that I don't know the website after pressing search as it runs a javascript.

Furthermore, how to find the parameters needed to pass to requests.post, e.g. data? Is header needed?

enter image description here

python web-scraping python-requests

edited Nov 22 '18 at 11:27

petezurich

3,65081834

asked Nov 22 '18 at 10:45

Chan

448215

edited Nov 22 '18 at 11:27

petezurich

3,65081834

asked Nov 22 '18 at 10:45

Chan

448215

edited Nov 22 '18 at 11:27

petezurich

3,65081834

edited Nov 22 '18 at 11:27

petezurich

3,65081834

edited Nov 22 '18 at 11:27

petezurich

3,65081834

asked Nov 22 '18 at 10:45

Chan

448215

asked Nov 22 '18 at 10:45

Chan

448215

asked Nov 22 '18 at 10:45

Chan

448215

Do you want to simulate POST request that sends after you enter 5 to "Stock code" field and press "Search"?

– Andersson
Nov 22 '18 at 10:48

Yes, you are right.

– Chan
Nov 22 '18 at 10:51

looks site has problems when clicking search

– mirhossein
Nov 22 '18 at 17:16

The website works. After inputting 5 in the field of stock code, then press search, you can view the page that shows the results.

– Chan
Nov 23 '18 at 1:09

Can anyone help?

– Chan
Nov 23 '18 at 3:46

|
show 1 more comment

Do you want to simulate POST request that sends after you enter 5 to "Stock code" field and press "Search"?

– Andersson
Nov 22 '18 at 10:48

Yes, you are right.

– Chan
Nov 22 '18 at 10:51

looks site has problems when clicking search

– mirhossein
Nov 22 '18 at 17:16

The website works. After inputting 5 in the field of stock code, then press search, you can view the page that shows the results.

– Chan
Nov 23 '18 at 1:09

Can anyone help?

– Chan
Nov 23 '18 at 3:46

Do you want to simulate POST request that sends after you enter 5 to "Stock code" field and press "Search"?

– Andersson
Nov 22 '18 at 10:48

Yes, you are right.

– Chan
Nov 22 '18 at 10:51

looks site has problems when clicking search

– mirhossein
Nov 22 '18 at 17:16

The website works. After inputting 5 in the field of stock code, then press search, you can view the page that shows the results.

– Chan
Nov 23 '18 at 1:09

Can anyone help?

– Chan
Nov 23 '18 at 3:46

|
show 1 more comment

1 Answer
1

active

oldest

votes

You have multiple options:

1) You can use Selenium. First install Selenium.

sudo pip3 install selenium

Then get a driver https://sites.google.com/a/chromium.org/chromedriver/downloads (Depending upon your OS you may need to specify the location of your driver)

from selenium import webdriver

from bs4 import BeautifulSoup

import time



browser = webdriver.Chrome()

url = "http://www3.hkexnews.hk/listedco/listconews/advancedsearch/search_active_main.aspx"

browser.get(url)

element = browser.find_element_by_id('ctl00_txt_stock_code')  # find the text box

time.sleep(2)

element.send_keys('5')  # populate the text box

time.sleep(2)

element.submit()  # submit the form

soup = BeautifulSoup(browser.page_source, 'html.parser')

browser.quit()

for news in soup.find_all(class_='news'):

    print(news.text)

2) Or use PyQt with QWebEngineView.

Install PyQt on Ubuntu:

    sudo apt-get install python3-pyqt5

    sudo apt-get install python3-pyqt5.qtwebengine

or on other OS (64 bit versions of Python)

    pip3 install PyQt5

Basically you load the first page with the form on. Fill in the form by running JavaScript then submit it. The loadFinished() signal is called twice, the second time because you submitted the form so you can use an if statement to differentiate between the calls.

import sys

from PyQt5.QtWidgets import QApplication

from PyQt5.QtCore import QUrl

from PyQt5.QtWebEngineWidgets import QWebEngineView

from bs4 import BeautifulSoup





class Render(QWebEngineView):

    def __init__(self, url):

        self.html = None

        self.first_pass = True

        self.app = QApplication(sys.argv)

        QWebEngineView.__init__(self)

        self.loadFinished.connect(self._load_finished)

        self.load(QUrl(url))

        self.app.exec_()



    def _load_finished(self, result):

        if self.first_pass:

            self._first_finished()

            self.first_pass = False

        else:

            self._second_finished()



    def _first_finished(self):

        self.page().runJavaScript("document.getElementById('ctl00_txt_stock_code').value = '5';")

        self.page().runJavaScript("document.getElementById('ctl00_sel_DateOfReleaseFrom_y').value='1999';")

        self.page().runJavaScript("preprocessMainForm();")

        self.page().runJavaScript("document.forms[0].submit();")



    def _second_finished(self):

        self.page().toHtml(self.callable)



    def callable(self, data):

        self.html = data

        self.app.quit()



url = "http://www3.hkexnews.hk/listedco/listconews/advancedsearch/search_active_main.aspx"

web = Render(url)

soup = BeautifulSoup(web.html, 'html.parser')

for news in soup.find_all(class_ = 'news'):

    print(news.text)

Outputs:

Voting Rights and Capital

Next Day Disclosure Return

NOTICE OF REDEMPTION AND CANCELLATION OF LISTING

THIRD INTERIM DIVIDEND FOR 2018

Notification of Transactions by Persons Discharging Managerial Responsibilities

Next Day Disclosure Return

THIRD INTERIM DIVIDEND FOR 2018

Monthly Return of Equity Issuer on Movements in Securities for the month ended 31 October 2018

Voting Rights and Capital

PUBLICATION OF BASE PROSPECTUS SUPPLEMENT

3Q 2018 EARNINGS RELEASE AUDIO WEBCAST AND CONFERENCE CALL

3Q EARNINGS RELEASE - HIGHLIGHTS

Scrip Dividend Circular

2018 Third Interim Dividend; Scrip Dividend

THIRD INTERIM DIVIDEND FOR 2018 SCRIP DIVIDEND ALTERNATIVE

NOTIFICATION OF MAJOR HOLDINGS

EARNINGS RELEASE FOR THIRD QUARTER 2018

NOTIFICATION OF MAJOR HOLDINGS

Monthly Return of Equity Issuer on Movements in Securities for the month ended 30 September 2018

THIRD INTERIM DIVIDEND FOR 2018; DIVIDEND ON PREFERENCE SHARES

Alternatively you can use Scrapy splash https://github.com/scrapy-plugins/scrapy-splash

Or Requests-HTML https://html.python-requests.org/ .

But I am not sure how you would fill the form in using these two last approaches.

Updated how to read the next pages:

import sys

from PyQt5.QtWidgets import QApplication

from PyQt5.QtCore import QUrl

from PyQt5.QtWebEngineWidgets import QWebEngineView

from bs4 import BeautifulSoup





class Render(QWebEngineView):

    def __init__(self, url):

    self.html = None

    self.count = 0

    self.first_pass = True

    self.app = QApplication(sys.argv)

    QWebEngineView.__init__(self)

    self.loadFinished.connect(self._load_finished)

    self.load(QUrl(url))

    self.app.exec_()



    def _load_finished(self, result):

    if self.first_pass:

        self._first_finished()

        self.first_pass = False

    else:

        self._second_finished()



    def _first_finished(self):

    self.page().runJavaScript("document.getElementById('ctl00_txt_stock_code').value = '5';")

    self.page().runJavaScript("document.getElementById('ctl00_sel_DateOfReleaseFrom_y').value='1999';")

    self.page().runJavaScript("preprocessMainForm();")

    self.page().runJavaScript("document.forms[0].submit();")



    def _second_finished(self):

    try:

        self.page().toHtml(self.parse)

        self.count += 1

        if self.count > 5:

             self.page().toHtml(self.callable)

        else:

            self.page().runJavaScript("document.getElementById('ctl00_btnNext2').click();")

    except:

        self.page().toHtml(self.callable)



    def parse(self, data):

    soup = BeautifulSoup(data, 'html.parser')

    for news in soup.find_all(class_ = 'news'):

        print(news.text)



    def callable(self, data):

    self.app.quit()



url = "http://www3.hkexnews.hk/listedco/listconews/advancedsearch/search_active_main.aspx"

web = Render(url)

edited Feb 12 at 20:41

answered Nov 24 '18 at 2:35

Dan-Dev

4,87822033

Thank you, Dan. Selenium can solve the problem. But I want to know how to deal with javascript webpage like this one using requests. Any other faster alternatives?

– Chan
Nov 26 '18 at 1:30

This particular web page has a VIEWSTATE token generated by JavaScript and an encrypted version of it also generated by JavaScript. Without actually running JavaScript is is virtually impossible to recreate these tokens. There is no way to do this with requests and I'm not sure how you would run the require JavaScript with Requests-HTML. If you don't like the Selenium option try the PyQt5 solution I gave in the answer.

– Dan-Dev
Nov 26 '18 at 11:50

How to read the next page?

– Chan
Feb 12 at 4:03

Updated the post with how to read the next pages

– Dan-Dev
Feb 12 at 20:41

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53429181%2fhow-to-get-the-web-page-using-requests-post%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

You have multiple options:

1) You can use Selenium. First install Selenium.

sudo pip3 install selenium

Then get a driver https://sites.google.com/a/chromium.org/chromedriver/downloads (Depending upon your OS you may need to specify the location of your driver)

from selenium import webdriver

from bs4 import BeautifulSoup

import time



browser = webdriver.Chrome()

url = "http://www3.hkexnews.hk/listedco/listconews/advancedsearch/search_active_main.aspx"

browser.get(url)

element = browser.find_element_by_id('ctl00_txt_stock_code')  # find the text box

time.sleep(2)

element.send_keys('5')  # populate the text box

time.sleep(2)

element.submit()  # submit the form

soup = BeautifulSoup(browser.page_source, 'html.parser')

browser.quit()

for news in soup.find_all(class_='news'):

    print(news.text)

2) Or use PyQt with QWebEngineView.

Install PyQt on Ubuntu:

    sudo apt-get install python3-pyqt5

    sudo apt-get install python3-pyqt5.qtwebengine

or on other OS (64 bit versions of Python)

    pip3 install PyQt5

import sys

from PyQt5.QtWidgets import QApplication

from PyQt5.QtCore import QUrl

from PyQt5.QtWebEngineWidgets import QWebEngineView

from bs4 import BeautifulSoup





class Render(QWebEngineView):

    def __init__(self, url):

        self.html = None

        self.first_pass = True

        self.app = QApplication(sys.argv)

        QWebEngineView.__init__(self)

        self.loadFinished.connect(self._load_finished)

        self.load(QUrl(url))

        self.app.exec_()



    def _load_finished(self, result):

        if self.first_pass:

            self._first_finished()

            self.first_pass = False

        else:

            self._second_finished()



    def _first_finished(self):

        self.page().runJavaScript("document.getElementById('ctl00_txt_stock_code').value = '5';")

        self.page().runJavaScript("document.getElementById('ctl00_sel_DateOfReleaseFrom_y').value='1999';")

        self.page().runJavaScript("preprocessMainForm();")

        self.page().runJavaScript("document.forms[0].submit();")



    def _second_finished(self):

        self.page().toHtml(self.callable)



    def callable(self, data):

        self.html = data

        self.app.quit()



url = "http://www3.hkexnews.hk/listedco/listconews/advancedsearch/search_active_main.aspx"

web = Render(url)

soup = BeautifulSoup(web.html, 'html.parser')

for news in soup.find_all(class_ = 'news'):

    print(news.text)

Outputs:

Voting Rights and Capital

Next Day Disclosure Return

NOTICE OF REDEMPTION AND CANCELLATION OF LISTING

THIRD INTERIM DIVIDEND FOR 2018

Notification of Transactions by Persons Discharging Managerial Responsibilities

Next Day Disclosure Return

THIRD INTERIM DIVIDEND FOR 2018

Monthly Return of Equity Issuer on Movements in Securities for the month ended 31 October 2018

Voting Rights and Capital

PUBLICATION OF BASE PROSPECTUS SUPPLEMENT

3Q 2018 EARNINGS RELEASE AUDIO WEBCAST AND CONFERENCE CALL

3Q EARNINGS RELEASE - HIGHLIGHTS

Scrip Dividend Circular

2018 Third Interim Dividend; Scrip Dividend

THIRD INTERIM DIVIDEND FOR 2018 SCRIP DIVIDEND ALTERNATIVE

NOTIFICATION OF MAJOR HOLDINGS

EARNINGS RELEASE FOR THIRD QUARTER 2018

NOTIFICATION OF MAJOR HOLDINGS

Monthly Return of Equity Issuer on Movements in Securities for the month ended 30 September 2018

THIRD INTERIM DIVIDEND FOR 2018; DIVIDEND ON PREFERENCE SHARES

Alternatively you can use Scrapy splash https://github.com/scrapy-plugins/scrapy-splash

Or Requests-HTML https://html.python-requests.org/ .

But I am not sure how you would fill the form in using these two last approaches.

Updated how to read the next pages:

import sys

from PyQt5.QtWidgets import QApplication

from PyQt5.QtCore import QUrl

from PyQt5.QtWebEngineWidgets import QWebEngineView

from bs4 import BeautifulSoup





class Render(QWebEngineView):

    def __init__(self, url):

    self.html = None

    self.count = 0

    self.first_pass = True

    self.app = QApplication(sys.argv)

    QWebEngineView.__init__(self)

    self.loadFinished.connect(self._load_finished)

    self.load(QUrl(url))

    self.app.exec_()



    def _load_finished(self, result):

    if self.first_pass:

        self._first_finished()

        self.first_pass = False

    else:

        self._second_finished()



    def _first_finished(self):

    self.page().runJavaScript("document.getElementById('ctl00_txt_stock_code').value = '5';")

    self.page().runJavaScript("document.getElementById('ctl00_sel_DateOfReleaseFrom_y').value='1999';")

    self.page().runJavaScript("preprocessMainForm();")

    self.page().runJavaScript("document.forms[0].submit();")



    def _second_finished(self):

    try:

        self.page().toHtml(self.parse)

        self.count += 1

        if self.count > 5:

             self.page().toHtml(self.callable)

        else:

            self.page().runJavaScript("document.getElementById('ctl00_btnNext2').click();")

    except:

        self.page().toHtml(self.callable)



    def parse(self, data):

    soup = BeautifulSoup(data, 'html.parser')

    for news in soup.find_all(class_ = 'news'):

        print(news.text)



    def callable(self, data):

    self.app.quit()



url = "http://www3.hkexnews.hk/listedco/listconews/advancedsearch/search_active_main.aspx"

web = Render(url)

edited Feb 12 at 20:41

answered Nov 24 '18 at 2:35

Dan-Dev

4,87822033

Thank you, Dan. Selenium can solve the problem. But I want to know how to deal with javascript webpage like this one using requests. Any other faster alternatives?

– Chan
Nov 26 '18 at 1:30

This particular web page has a VIEWSTATE token generated by JavaScript and an encrypted version of it also generated by JavaScript. Without actually running JavaScript is is virtually impossible to recreate these tokens. There is no way to do this with requests and I'm not sure how you would run the require JavaScript with Requests-HTML. If you don't like the Selenium option try the PyQt5 solution I gave in the answer.

– Dan-Dev
Nov 26 '18 at 11:50

How to read the next page?

– Chan
Feb 12 at 4:03

Updated the post with how to read the next pages

– Dan-Dev
Feb 12 at 20:41

add a comment |

You have multiple options:

1) You can use Selenium. First install Selenium.

sudo pip3 install selenium

Then get a driver https://sites.google.com/a/chromium.org/chromedriver/downloads (Depending upon your OS you may need to specify the location of your driver)

from selenium import webdriver

from bs4 import BeautifulSoup

import time



browser = webdriver.Chrome()

url = "http://www3.hkexnews.hk/listedco/listconews/advancedsearch/search_active_main.aspx"

browser.get(url)

element = browser.find_element_by_id('ctl00_txt_stock_code')  # find the text box

time.sleep(2)

element.send_keys('5')  # populate the text box

time.sleep(2)

element.submit()  # submit the form

soup = BeautifulSoup(browser.page_source, 'html.parser')

browser.quit()

for news in soup.find_all(class_='news'):

    print(news.text)

2) Or use PyQt with QWebEngineView.

Install PyQt on Ubuntu:

    sudo apt-get install python3-pyqt5

    sudo apt-get install python3-pyqt5.qtwebengine

or on other OS (64 bit versions of Python)

    pip3 install PyQt5

import sys

from PyQt5.QtWidgets import QApplication

from PyQt5.QtCore import QUrl

from PyQt5.QtWebEngineWidgets import QWebEngineView

from bs4 import BeautifulSoup





class Render(QWebEngineView):

    def __init__(self, url):

        self.html = None

        self.first_pass = True

        self.app = QApplication(sys.argv)

        QWebEngineView.__init__(self)

        self.loadFinished.connect(self._load_finished)

        self.load(QUrl(url))

        self.app.exec_()



    def _load_finished(self, result):

        if self.first_pass:

            self._first_finished()

            self.first_pass = False

        else:

            self._second_finished()



    def _first_finished(self):

        self.page().runJavaScript("document.getElementById('ctl00_txt_stock_code').value = '5';")

        self.page().runJavaScript("document.getElementById('ctl00_sel_DateOfReleaseFrom_y').value='1999';")

        self.page().runJavaScript("preprocessMainForm();")

        self.page().runJavaScript("document.forms[0].submit();")



    def _second_finished(self):

        self.page().toHtml(self.callable)



    def callable(self, data):

        self.html = data

        self.app.quit()



url = "http://www3.hkexnews.hk/listedco/listconews/advancedsearch/search_active_main.aspx"

web = Render(url)

soup = BeautifulSoup(web.html, 'html.parser')

for news in soup.find_all(class_ = 'news'):

    print(news.text)

Outputs:

Voting Rights and Capital

Next Day Disclosure Return

NOTICE OF REDEMPTION AND CANCELLATION OF LISTING

THIRD INTERIM DIVIDEND FOR 2018

Notification of Transactions by Persons Discharging Managerial Responsibilities

Next Day Disclosure Return

THIRD INTERIM DIVIDEND FOR 2018

Monthly Return of Equity Issuer on Movements in Securities for the month ended 31 October 2018

Voting Rights and Capital

PUBLICATION OF BASE PROSPECTUS SUPPLEMENT

3Q 2018 EARNINGS RELEASE AUDIO WEBCAST AND CONFERENCE CALL

3Q EARNINGS RELEASE - HIGHLIGHTS

Scrip Dividend Circular

2018 Third Interim Dividend; Scrip Dividend

THIRD INTERIM DIVIDEND FOR 2018 SCRIP DIVIDEND ALTERNATIVE

NOTIFICATION OF MAJOR HOLDINGS

EARNINGS RELEASE FOR THIRD QUARTER 2018

NOTIFICATION OF MAJOR HOLDINGS

Monthly Return of Equity Issuer on Movements in Securities for the month ended 30 September 2018

THIRD INTERIM DIVIDEND FOR 2018; DIVIDEND ON PREFERENCE SHARES

Alternatively you can use Scrapy splash https://github.com/scrapy-plugins/scrapy-splash

Or Requests-HTML https://html.python-requests.org/ .

But I am not sure how you would fill the form in using these two last approaches.

Updated how to read the next pages:

import sys

from PyQt5.QtWidgets import QApplication

from PyQt5.QtCore import QUrl

from PyQt5.QtWebEngineWidgets import QWebEngineView

from bs4 import BeautifulSoup





class Render(QWebEngineView):

    def __init__(self, url):

    self.html = None

    self.count = 0

    self.first_pass = True

    self.app = QApplication(sys.argv)

    QWebEngineView.__init__(self)

    self.loadFinished.connect(self._load_finished)

    self.load(QUrl(url))

    self.app.exec_()



    def _load_finished(self, result):

    if self.first_pass:

        self._first_finished()

        self.first_pass = False

    else:

        self._second_finished()



    def _first_finished(self):

    self.page().runJavaScript("document.getElementById('ctl00_txt_stock_code').value = '5';")

    self.page().runJavaScript("document.getElementById('ctl00_sel_DateOfReleaseFrom_y').value='1999';")

    self.page().runJavaScript("preprocessMainForm();")

    self.page().runJavaScript("document.forms[0].submit();")



    def _second_finished(self):

    try:

        self.page().toHtml(self.parse)

        self.count += 1

        if self.count > 5:

             self.page().toHtml(self.callable)

        else:

            self.page().runJavaScript("document.getElementById('ctl00_btnNext2').click();")

    except:

        self.page().toHtml(self.callable)



    def parse(self, data):

    soup = BeautifulSoup(data, 'html.parser')

    for news in soup.find_all(class_ = 'news'):

        print(news.text)



    def callable(self, data):

    self.app.quit()



url = "http://www3.hkexnews.hk/listedco/listconews/advancedsearch/search_active_main.aspx"

web = Render(url)

edited Feb 12 at 20:41

answered Nov 24 '18 at 2:35

Dan-Dev

4,87822033

Thank you, Dan. Selenium can solve the problem. But I want to know how to deal with javascript webpage like this one using requests. Any other faster alternatives?

– Chan
Nov 26 '18 at 1:30

This particular web page has a VIEWSTATE token generated by JavaScript and an encrypted version of it also generated by JavaScript. Without actually running JavaScript is is virtually impossible to recreate these tokens. There is no way to do this with requests and I'm not sure how you would run the require JavaScript with Requests-HTML. If you don't like the Selenium option try the PyQt5 solution I gave in the answer.

– Dan-Dev
Nov 26 '18 at 11:50

How to read the next page?

– Chan
Feb 12 at 4:03

Updated the post with how to read the next pages

– Dan-Dev
Feb 12 at 20:41

add a comment |

You have multiple options:

1) You can use Selenium. First install Selenium.

sudo pip3 install selenium

Then get a driver https://sites.google.com/a/chromium.org/chromedriver/downloads (Depending upon your OS you may need to specify the location of your driver)

from selenium import webdriver

from bs4 import BeautifulSoup

import time



browser = webdriver.Chrome()

url = "http://www3.hkexnews.hk/listedco/listconews/advancedsearch/search_active_main.aspx"

browser.get(url)

element = browser.find_element_by_id('ctl00_txt_stock_code')  # find the text box

time.sleep(2)

element.send_keys('5')  # populate the text box

time.sleep(2)

element.submit()  # submit the form

soup = BeautifulSoup(browser.page_source, 'html.parser')

browser.quit()

for news in soup.find_all(class_='news'):

    print(news.text)

2) Or use PyQt with QWebEngineView.

Install PyQt on Ubuntu:

    sudo apt-get install python3-pyqt5

    sudo apt-get install python3-pyqt5.qtwebengine

or on other OS (64 bit versions of Python)

    pip3 install PyQt5

import sys

from PyQt5.QtWidgets import QApplication

from PyQt5.QtCore import QUrl

from PyQt5.QtWebEngineWidgets import QWebEngineView

from bs4 import BeautifulSoup





class Render(QWebEngineView):

    def __init__(self, url):

        self.html = None

        self.first_pass = True

        self.app = QApplication(sys.argv)

        QWebEngineView.__init__(self)

        self.loadFinished.connect(self._load_finished)

        self.load(QUrl(url))

        self.app.exec_()



    def _load_finished(self, result):

        if self.first_pass:

            self._first_finished()

            self.first_pass = False

        else:

            self._second_finished()



    def _first_finished(self):

        self.page().runJavaScript("document.getElementById('ctl00_txt_stock_code').value = '5';")

        self.page().runJavaScript("document.getElementById('ctl00_sel_DateOfReleaseFrom_y').value='1999';")

        self.page().runJavaScript("preprocessMainForm();")

        self.page().runJavaScript("document.forms[0].submit();")



    def _second_finished(self):

        self.page().toHtml(self.callable)



    def callable(self, data):

        self.html = data

        self.app.quit()



url = "http://www3.hkexnews.hk/listedco/listconews/advancedsearch/search_active_main.aspx"

web = Render(url)

soup = BeautifulSoup(web.html, 'html.parser')

for news in soup.find_all(class_ = 'news'):

    print(news.text)

Outputs:

Voting Rights and Capital

Next Day Disclosure Return

NOTICE OF REDEMPTION AND CANCELLATION OF LISTING

THIRD INTERIM DIVIDEND FOR 2018

Notification of Transactions by Persons Discharging Managerial Responsibilities

Next Day Disclosure Return

THIRD INTERIM DIVIDEND FOR 2018

Monthly Return of Equity Issuer on Movements in Securities for the month ended 31 October 2018

Voting Rights and Capital

PUBLICATION OF BASE PROSPECTUS SUPPLEMENT

3Q 2018 EARNINGS RELEASE AUDIO WEBCAST AND CONFERENCE CALL

3Q EARNINGS RELEASE - HIGHLIGHTS

Scrip Dividend Circular

2018 Third Interim Dividend; Scrip Dividend

THIRD INTERIM DIVIDEND FOR 2018 SCRIP DIVIDEND ALTERNATIVE

NOTIFICATION OF MAJOR HOLDINGS

EARNINGS RELEASE FOR THIRD QUARTER 2018

NOTIFICATION OF MAJOR HOLDINGS

Monthly Return of Equity Issuer on Movements in Securities for the month ended 30 September 2018

THIRD INTERIM DIVIDEND FOR 2018; DIVIDEND ON PREFERENCE SHARES

Alternatively you can use Scrapy splash https://github.com/scrapy-plugins/scrapy-splash

Or Requests-HTML https://html.python-requests.org/ .

But I am not sure how you would fill the form in using these two last approaches.

Updated how to read the next pages:

import sys

from PyQt5.QtWidgets import QApplication

from PyQt5.QtCore import QUrl

from PyQt5.QtWebEngineWidgets import QWebEngineView

from bs4 import BeautifulSoup





class Render(QWebEngineView):

    def __init__(self, url):

    self.html = None

    self.count = 0

    self.first_pass = True

    self.app = QApplication(sys.argv)

    QWebEngineView.__init__(self)

    self.loadFinished.connect(self._load_finished)

    self.load(QUrl(url))

    self.app.exec_()



    def _load_finished(self, result):

    if self.first_pass:

        self._first_finished()

        self.first_pass = False

    else:

        self._second_finished()



    def _first_finished(self):

    self.page().runJavaScript("document.getElementById('ctl00_txt_stock_code').value = '5';")

    self.page().runJavaScript("document.getElementById('ctl00_sel_DateOfReleaseFrom_y').value='1999';")

    self.page().runJavaScript("preprocessMainForm();")

    self.page().runJavaScript("document.forms[0].submit();")



    def _second_finished(self):

    try:

        self.page().toHtml(self.parse)

        self.count += 1

        if self.count > 5:

             self.page().toHtml(self.callable)

        else:

            self.page().runJavaScript("document.getElementById('ctl00_btnNext2').click();")

    except:

        self.page().toHtml(self.callable)



    def parse(self, data):

    soup = BeautifulSoup(data, 'html.parser')

    for news in soup.find_all(class_ = 'news'):

        print(news.text)



    def callable(self, data):

    self.app.quit()



url = "http://www3.hkexnews.hk/listedco/listconews/advancedsearch/search_active_main.aspx"

web = Render(url)

edited Feb 12 at 20:41

answered Nov 24 '18 at 2:35

Dan-Dev

4,87822033

You have multiple options:

1) You can use Selenium. First install Selenium.

sudo pip3 install selenium

Then get a driver https://sites.google.com/a/chromium.org/chromedriver/downloads (Depending upon your OS you may need to specify the location of your driver)

from selenium import webdriver

from bs4 import BeautifulSoup

import time



browser = webdriver.Chrome()

url = "http://www3.hkexnews.hk/listedco/listconews/advancedsearch/search_active_main.aspx"

browser.get(url)

element = browser.find_element_by_id('ctl00_txt_stock_code')  # find the text box

time.sleep(2)

element.send_keys('5')  # populate the text box

time.sleep(2)

element.submit()  # submit the form

soup = BeautifulSoup(browser.page_source, 'html.parser')

browser.quit()

for news in soup.find_all(class_='news'):

    print(news.text)

2) Or use PyQt with QWebEngineView.

Install PyQt on Ubuntu:

    sudo apt-get install python3-pyqt5

    sudo apt-get install python3-pyqt5.qtwebengine

or on other OS (64 bit versions of Python)

    pip3 install PyQt5

import sys

from PyQt5.QtWidgets import QApplication

from PyQt5.QtCore import QUrl

from PyQt5.QtWebEngineWidgets import QWebEngineView

from bs4 import BeautifulSoup





class Render(QWebEngineView):

    def __init__(self, url):

        self.html = None

        self.first_pass = True

        self.app = QApplication(sys.argv)

        QWebEngineView.__init__(self)

        self.loadFinished.connect(self._load_finished)

        self.load(QUrl(url))

        self.app.exec_()



    def _load_finished(self, result):

        if self.first_pass:

            self._first_finished()

            self.first_pass = False

        else:

            self._second_finished()



    def _first_finished(self):

        self.page().runJavaScript("document.getElementById('ctl00_txt_stock_code').value = '5';")

        self.page().runJavaScript("document.getElementById('ctl00_sel_DateOfReleaseFrom_y').value='1999';")

        self.page().runJavaScript("preprocessMainForm();")

        self.page().runJavaScript("document.forms[0].submit();")



    def _second_finished(self):

        self.page().toHtml(self.callable)



    def callable(self, data):

        self.html = data

        self.app.quit()



url = "http://www3.hkexnews.hk/listedco/listconews/advancedsearch/search_active_main.aspx"

web = Render(url)

soup = BeautifulSoup(web.html, 'html.parser')

for news in soup.find_all(class_ = 'news'):

    print(news.text)

Outputs:

Voting Rights and Capital

Next Day Disclosure Return

NOTICE OF REDEMPTION AND CANCELLATION OF LISTING

THIRD INTERIM DIVIDEND FOR 2018

Notification of Transactions by Persons Discharging Managerial Responsibilities

Next Day Disclosure Return

THIRD INTERIM DIVIDEND FOR 2018

Monthly Return of Equity Issuer on Movements in Securities for the month ended 31 October 2018

Voting Rights and Capital

PUBLICATION OF BASE PROSPECTUS SUPPLEMENT

3Q 2018 EARNINGS RELEASE AUDIO WEBCAST AND CONFERENCE CALL

3Q EARNINGS RELEASE - HIGHLIGHTS

Scrip Dividend Circular

2018 Third Interim Dividend; Scrip Dividend

THIRD INTERIM DIVIDEND FOR 2018 SCRIP DIVIDEND ALTERNATIVE

NOTIFICATION OF MAJOR HOLDINGS

EARNINGS RELEASE FOR THIRD QUARTER 2018

NOTIFICATION OF MAJOR HOLDINGS

Monthly Return of Equity Issuer on Movements in Securities for the month ended 30 September 2018

THIRD INTERIM DIVIDEND FOR 2018; DIVIDEND ON PREFERENCE SHARES

Alternatively you can use Scrapy splash https://github.com/scrapy-plugins/scrapy-splash

Or Requests-HTML https://html.python-requests.org/ .

But I am not sure how you would fill the form in using these two last approaches.

Updated how to read the next pages:

import sys

from PyQt5.QtWidgets import QApplication

from PyQt5.QtCore import QUrl

from PyQt5.QtWebEngineWidgets import QWebEngineView

from bs4 import BeautifulSoup





class Render(QWebEngineView):

    def __init__(self, url):

    self.html = None

    self.count = 0

    self.first_pass = True

    self.app = QApplication(sys.argv)

    QWebEngineView.__init__(self)

    self.loadFinished.connect(self._load_finished)

    self.load(QUrl(url))

    self.app.exec_()



    def _load_finished(self, result):

    if self.first_pass:

        self._first_finished()

        self.first_pass = False

    else:

        self._second_finished()



    def _first_finished(self):

    self.page().runJavaScript("document.getElementById('ctl00_txt_stock_code').value = '5';")

    self.page().runJavaScript("document.getElementById('ctl00_sel_DateOfReleaseFrom_y').value='1999';")

    self.page().runJavaScript("preprocessMainForm();")

    self.page().runJavaScript("document.forms[0].submit();")



    def _second_finished(self):

    try:

        self.page().toHtml(self.parse)

        self.count += 1

        if self.count > 5:

             self.page().toHtml(self.callable)

        else:

            self.page().runJavaScript("document.getElementById('ctl00_btnNext2').click();")

    except:

        self.page().toHtml(self.callable)



    def parse(self, data):

    soup = BeautifulSoup(data, 'html.parser')

    for news in soup.find_all(class_ = 'news'):

        print(news.text)



    def callable(self, data):

    self.app.quit()



url = "http://www3.hkexnews.hk/listedco/listconews/advancedsearch/search_active_main.aspx"

web = Render(url)

edited Feb 12 at 20:41

answered Nov 24 '18 at 2:35

Dan-Dev

4,87822033

edited Feb 12 at 20:41

answered Nov 24 '18 at 2:35

Dan-Dev

4,87822033

answered Nov 24 '18 at 2:35

Dan-Dev

4,87822033

answered Nov 24 '18 at 2:35

Dan-Dev

4,87822033

Thank you, Dan. Selenium can solve the problem. But I want to know how to deal with javascript webpage like this one using requests. Any other faster alternatives?

– Chan
Nov 26 '18 at 1:30

This particular web page has a VIEWSTATE token generated by JavaScript and an encrypted version of it also generated by JavaScript. Without actually running JavaScript is is virtually impossible to recreate these tokens. There is no way to do this with requests and I'm not sure how you would run the require JavaScript with Requests-HTML. If you don't like the Selenium option try the PyQt5 solution I gave in the answer.

– Dan-Dev
Nov 26 '18 at 11:50

How to read the next page?

– Chan
Feb 12 at 4:03

Updated the post with how to read the next pages

– Dan-Dev
Feb 12 at 20:41

add a comment |

Thank you, Dan. Selenium can solve the problem. But I want to know how to deal with javascript webpage like this one using requests. Any other faster alternatives?

– Chan
Nov 26 '18 at 1:30

This particular web page has a VIEWSTATE token generated by JavaScript and an encrypted version of it also generated by JavaScript. Without actually running JavaScript is is virtually impossible to recreate these tokens. There is no way to do this with requests and I'm not sure how you would run the require JavaScript with Requests-HTML. If you don't like the Selenium option try the PyQt5 solution I gave in the answer.

– Dan-Dev
Nov 26 '18 at 11:50

How to read the next page?

– Chan
Feb 12 at 4:03

Updated the post with how to read the next pages

– Dan-Dev
Feb 12 at 20:41

Thank you, Dan. Selenium can solve the problem. But I want to know how to deal with javascript webpage like this one using requests. Any other faster alternatives?

– Chan
Nov 26 '18 at 1:30

This particular web page has a VIEWSTATE token generated by JavaScript and an encrypted version of it also generated by JavaScript. Without actually running JavaScript is is virtually impossible to recreate these tokens. There is no way to do this with requests and I'm not sure how you would run the require JavaScript with Requests-HTML. If you don't like the Selenium option try the PyQt5 solution I gave in the answer.

– Dan-Dev
Nov 26 '18 at 11:50

How to read the next page?

– Chan
Feb 12 at 4:03

Updated the post with how to read the next pages

– Dan-Dev
Feb 12 at 20:41

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

Search This Blog

Ufyukyu