How to get the web page using requests.post?
I want to get the result of the web page http://www3.hkexnews.hk/listedco/listconews/advancedsearch/search_active_main.aspx
with the input of stock code being 5.
The problem is that I don't know the website after pressing search as it runs a javascript.
Furthermore, how to find the parameters needed to pass to requests.post
, e.g. data? Is header needed?
python web-scraping python-requests
|
show 1 more comment
I want to get the result of the web page http://www3.hkexnews.hk/listedco/listconews/advancedsearch/search_active_main.aspx
with the input of stock code being 5.
The problem is that I don't know the website after pressing search as it runs a javascript.
Furthermore, how to find the parameters needed to pass to requests.post
, e.g. data? Is header needed?
python web-scraping python-requests
Do you want to simulate POST request that sends after you enter5
to "Stock code" field and press "Search"?
– Andersson
Nov 22 '18 at 10:48
Yes, you are right.
– Chan
Nov 22 '18 at 10:51
looks site has problems when clicking search
– mirhossein
Nov 22 '18 at 17:16
The website works. After inputting 5 in the field of stock code, then presssearch
, you can view the page that shows the results.
– Chan
Nov 23 '18 at 1:09
Can anyone help?
– Chan
Nov 23 '18 at 3:46
|
show 1 more comment
I want to get the result of the web page http://www3.hkexnews.hk/listedco/listconews/advancedsearch/search_active_main.aspx
with the input of stock code being 5.
The problem is that I don't know the website after pressing search as it runs a javascript.
Furthermore, how to find the parameters needed to pass to requests.post
, e.g. data? Is header needed?
python web-scraping python-requests
I want to get the result of the web page http://www3.hkexnews.hk/listedco/listconews/advancedsearch/search_active_main.aspx
with the input of stock code being 5.
The problem is that I don't know the website after pressing search as it runs a javascript.
Furthermore, how to find the parameters needed to pass to requests.post
, e.g. data? Is header needed?
python web-scraping python-requests
python web-scraping python-requests
edited Nov 22 '18 at 11:27
petezurich
3,65081834
3,65081834
asked Nov 22 '18 at 10:45
ChanChan
448215
448215
Do you want to simulate POST request that sends after you enter5
to "Stock code" field and press "Search"?
– Andersson
Nov 22 '18 at 10:48
Yes, you are right.
– Chan
Nov 22 '18 at 10:51
looks site has problems when clicking search
– mirhossein
Nov 22 '18 at 17:16
The website works. After inputting 5 in the field of stock code, then presssearch
, you can view the page that shows the results.
– Chan
Nov 23 '18 at 1:09
Can anyone help?
– Chan
Nov 23 '18 at 3:46
|
show 1 more comment
Do you want to simulate POST request that sends after you enter5
to "Stock code" field and press "Search"?
– Andersson
Nov 22 '18 at 10:48
Yes, you are right.
– Chan
Nov 22 '18 at 10:51
looks site has problems when clicking search
– mirhossein
Nov 22 '18 at 17:16
The website works. After inputting 5 in the field of stock code, then presssearch
, you can view the page that shows the results.
– Chan
Nov 23 '18 at 1:09
Can anyone help?
– Chan
Nov 23 '18 at 3:46
Do you want to simulate POST request that sends after you enter
5
to "Stock code" field and press "Search"?– Andersson
Nov 22 '18 at 10:48
Do you want to simulate POST request that sends after you enter
5
to "Stock code" field and press "Search"?– Andersson
Nov 22 '18 at 10:48
Yes, you are right.
– Chan
Nov 22 '18 at 10:51
Yes, you are right.
– Chan
Nov 22 '18 at 10:51
looks site has problems when clicking search
– mirhossein
Nov 22 '18 at 17:16
looks site has problems when clicking search
– mirhossein
Nov 22 '18 at 17:16
The website works. After inputting 5 in the field of stock code, then press
search
, you can view the page that shows the results.– Chan
Nov 23 '18 at 1:09
The website works. After inputting 5 in the field of stock code, then press
search
, you can view the page that shows the results.– Chan
Nov 23 '18 at 1:09
Can anyone help?
– Chan
Nov 23 '18 at 3:46
Can anyone help?
– Chan
Nov 23 '18 at 3:46
|
show 1 more comment
1 Answer
1
active
oldest
votes
You have multiple options:
1) You can use Selenium. First install Selenium.
sudo pip3 install selenium
Then get a driver https://sites.google.com/a/chromium.org/chromedriver/downloads (Depending upon your OS you may need to specify the location of your driver)
from selenium import webdriver
from bs4 import BeautifulSoup
import time
browser = webdriver.Chrome()
url = "http://www3.hkexnews.hk/listedco/listconews/advancedsearch/search_active_main.aspx"
browser.get(url)
element = browser.find_element_by_id('ctl00_txt_stock_code') # find the text box
time.sleep(2)
element.send_keys('5') # populate the text box
time.sleep(2)
element.submit() # submit the form
soup = BeautifulSoup(browser.page_source, 'html.parser')
browser.quit()
for news in soup.find_all(class_='news'):
print(news.text)
2) Or use PyQt with QWebEngineView.
Install PyQt on Ubuntu:
sudo apt-get install python3-pyqt5
sudo apt-get install python3-pyqt5.qtwebengine
or on other OS (64 bit versions of Python)
pip3 install PyQt5
Basically you load the first page with the form on. Fill in the form by running JavaScript then submit it. The loadFinished() signal is called twice, the second time because you submitted the form so you can use an if statement to differentiate between the calls.
import sys
from PyQt5.QtWidgets import QApplication
from PyQt5.QtCore import QUrl
from PyQt5.QtWebEngineWidgets import QWebEngineView
from bs4 import BeautifulSoup
class Render(QWebEngineView):
def __init__(self, url):
self.html = None
self.first_pass = True
self.app = QApplication(sys.argv)
QWebEngineView.__init__(self)
self.loadFinished.connect(self._load_finished)
self.load(QUrl(url))
self.app.exec_()
def _load_finished(self, result):
if self.first_pass:
self._first_finished()
self.first_pass = False
else:
self._second_finished()
def _first_finished(self):
self.page().runJavaScript("document.getElementById('ctl00_txt_stock_code').value = '5';")
self.page().runJavaScript("document.getElementById('ctl00_sel_DateOfReleaseFrom_y').value='1999';")
self.page().runJavaScript("preprocessMainForm();")
self.page().runJavaScript("document.forms[0].submit();")
def _second_finished(self):
self.page().toHtml(self.callable)
def callable(self, data):
self.html = data
self.app.quit()
url = "http://www3.hkexnews.hk/listedco/listconews/advancedsearch/search_active_main.aspx"
web = Render(url)
soup = BeautifulSoup(web.html, 'html.parser')
for news in soup.find_all(class_ = 'news'):
print(news.text)
Outputs:
Voting Rights and Capital
Next Day Disclosure Return
NOTICE OF REDEMPTION AND CANCELLATION OF LISTING
THIRD INTERIM DIVIDEND FOR 2018
Notification of Transactions by Persons Discharging Managerial Responsibilities
Next Day Disclosure Return
THIRD INTERIM DIVIDEND FOR 2018
Monthly Return of Equity Issuer on Movements in Securities for the month ended 31 October 2018
Voting Rights and Capital
PUBLICATION OF BASE PROSPECTUS SUPPLEMENT
3Q 2018 EARNINGS RELEASE AUDIO WEBCAST AND CONFERENCE CALL
3Q EARNINGS RELEASE - HIGHLIGHTS
Scrip Dividend Circular
2018 Third Interim Dividend; Scrip Dividend
THIRD INTERIM DIVIDEND FOR 2018 SCRIP DIVIDEND ALTERNATIVE
NOTIFICATION OF MAJOR HOLDINGS
EARNINGS RELEASE FOR THIRD QUARTER 2018
NOTIFICATION OF MAJOR HOLDINGS
Monthly Return of Equity Issuer on Movements in Securities for the month ended 30 September 2018
THIRD INTERIM DIVIDEND FOR 2018; DIVIDEND ON PREFERENCE SHARES
Alternatively you can use Scrapy splash https://github.com/scrapy-plugins/scrapy-splash
Or Requests-HTML https://html.python-requests.org/ .
But I am not sure how you would fill the form in using these two last approaches.
Updated how to read the next pages:
import sys
from PyQt5.QtWidgets import QApplication
from PyQt5.QtCore import QUrl
from PyQt5.QtWebEngineWidgets import QWebEngineView
from bs4 import BeautifulSoup
class Render(QWebEngineView):
def __init__(self, url):
self.html = None
self.count = 0
self.first_pass = True
self.app = QApplication(sys.argv)
QWebEngineView.__init__(self)
self.loadFinished.connect(self._load_finished)
self.load(QUrl(url))
self.app.exec_()
def _load_finished(self, result):
if self.first_pass:
self._first_finished()
self.first_pass = False
else:
self._second_finished()
def _first_finished(self):
self.page().runJavaScript("document.getElementById('ctl00_txt_stock_code').value = '5';")
self.page().runJavaScript("document.getElementById('ctl00_sel_DateOfReleaseFrom_y').value='1999';")
self.page().runJavaScript("preprocessMainForm();")
self.page().runJavaScript("document.forms[0].submit();")
def _second_finished(self):
try:
self.page().toHtml(self.parse)
self.count += 1
if self.count > 5:
self.page().toHtml(self.callable)
else:
self.page().runJavaScript("document.getElementById('ctl00_btnNext2').click();")
except:
self.page().toHtml(self.callable)
def parse(self, data):
soup = BeautifulSoup(data, 'html.parser')
for news in soup.find_all(class_ = 'news'):
print(news.text)
def callable(self, data):
self.app.quit()
url = "http://www3.hkexnews.hk/listedco/listconews/advancedsearch/search_active_main.aspx"
web = Render(url)
Thank you, Dan. Selenium can solve the problem. But I want to know how to deal with javascript webpage like this one usingrequests
. Any other faster alternatives?
– Chan
Nov 26 '18 at 1:30
This particular web page has a VIEWSTATE token generated by JavaScript and an encrypted version of it also generated by JavaScript. Without actually running JavaScript is is virtually impossible to recreate these tokens. There is no way to do this with requests and I'm not sure how you would run the require JavaScript with Requests-HTML. If you don't like the Selenium option try the PyQt5 solution I gave in the answer.
– Dan-Dev
Nov 26 '18 at 11:50
How to read the next page?
– Chan
Feb 12 at 4:03
Updated the post with how to read the next pages
– Dan-Dev
Feb 12 at 20:41
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53429181%2fhow-to-get-the-web-page-using-requests-post%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
You have multiple options:
1) You can use Selenium. First install Selenium.
sudo pip3 install selenium
Then get a driver https://sites.google.com/a/chromium.org/chromedriver/downloads (Depending upon your OS you may need to specify the location of your driver)
from selenium import webdriver
from bs4 import BeautifulSoup
import time
browser = webdriver.Chrome()
url = "http://www3.hkexnews.hk/listedco/listconews/advancedsearch/search_active_main.aspx"
browser.get(url)
element = browser.find_element_by_id('ctl00_txt_stock_code') # find the text box
time.sleep(2)
element.send_keys('5') # populate the text box
time.sleep(2)
element.submit() # submit the form
soup = BeautifulSoup(browser.page_source, 'html.parser')
browser.quit()
for news in soup.find_all(class_='news'):
print(news.text)
2) Or use PyQt with QWebEngineView.
Install PyQt on Ubuntu:
sudo apt-get install python3-pyqt5
sudo apt-get install python3-pyqt5.qtwebengine
or on other OS (64 bit versions of Python)
pip3 install PyQt5
Basically you load the first page with the form on. Fill in the form by running JavaScript then submit it. The loadFinished() signal is called twice, the second time because you submitted the form so you can use an if statement to differentiate between the calls.
import sys
from PyQt5.QtWidgets import QApplication
from PyQt5.QtCore import QUrl
from PyQt5.QtWebEngineWidgets import QWebEngineView
from bs4 import BeautifulSoup
class Render(QWebEngineView):
def __init__(self, url):
self.html = None
self.first_pass = True
self.app = QApplication(sys.argv)
QWebEngineView.__init__(self)
self.loadFinished.connect(self._load_finished)
self.load(QUrl(url))
self.app.exec_()
def _load_finished(self, result):
if self.first_pass:
self._first_finished()
self.first_pass = False
else:
self._second_finished()
def _first_finished(self):
self.page().runJavaScript("document.getElementById('ctl00_txt_stock_code').value = '5';")
self.page().runJavaScript("document.getElementById('ctl00_sel_DateOfReleaseFrom_y').value='1999';")
self.page().runJavaScript("preprocessMainForm();")
self.page().runJavaScript("document.forms[0].submit();")
def _second_finished(self):
self.page().toHtml(self.callable)
def callable(self, data):
self.html = data
self.app.quit()
url = "http://www3.hkexnews.hk/listedco/listconews/advancedsearch/search_active_main.aspx"
web = Render(url)
soup = BeautifulSoup(web.html, 'html.parser')
for news in soup.find_all(class_ = 'news'):
print(news.text)
Outputs:
Voting Rights and Capital
Next Day Disclosure Return
NOTICE OF REDEMPTION AND CANCELLATION OF LISTING
THIRD INTERIM DIVIDEND FOR 2018
Notification of Transactions by Persons Discharging Managerial Responsibilities
Next Day Disclosure Return
THIRD INTERIM DIVIDEND FOR 2018
Monthly Return of Equity Issuer on Movements in Securities for the month ended 31 October 2018
Voting Rights and Capital
PUBLICATION OF BASE PROSPECTUS SUPPLEMENT
3Q 2018 EARNINGS RELEASE AUDIO WEBCAST AND CONFERENCE CALL
3Q EARNINGS RELEASE - HIGHLIGHTS
Scrip Dividend Circular
2018 Third Interim Dividend; Scrip Dividend
THIRD INTERIM DIVIDEND FOR 2018 SCRIP DIVIDEND ALTERNATIVE
NOTIFICATION OF MAJOR HOLDINGS
EARNINGS RELEASE FOR THIRD QUARTER 2018
NOTIFICATION OF MAJOR HOLDINGS
Monthly Return of Equity Issuer on Movements in Securities for the month ended 30 September 2018
THIRD INTERIM DIVIDEND FOR 2018; DIVIDEND ON PREFERENCE SHARES
Alternatively you can use Scrapy splash https://github.com/scrapy-plugins/scrapy-splash
Or Requests-HTML https://html.python-requests.org/ .
But I am not sure how you would fill the form in using these two last approaches.
Updated how to read the next pages:
import sys
from PyQt5.QtWidgets import QApplication
from PyQt5.QtCore import QUrl
from PyQt5.QtWebEngineWidgets import QWebEngineView
from bs4 import BeautifulSoup
class Render(QWebEngineView):
def __init__(self, url):
self.html = None
self.count = 0
self.first_pass = True
self.app = QApplication(sys.argv)
QWebEngineView.__init__(self)
self.loadFinished.connect(self._load_finished)
self.load(QUrl(url))
self.app.exec_()
def _load_finished(self, result):
if self.first_pass:
self._first_finished()
self.first_pass = False
else:
self._second_finished()
def _first_finished(self):
self.page().runJavaScript("document.getElementById('ctl00_txt_stock_code').value = '5';")
self.page().runJavaScript("document.getElementById('ctl00_sel_DateOfReleaseFrom_y').value='1999';")
self.page().runJavaScript("preprocessMainForm();")
self.page().runJavaScript("document.forms[0].submit();")
def _second_finished(self):
try:
self.page().toHtml(self.parse)
self.count += 1
if self.count > 5:
self.page().toHtml(self.callable)
else:
self.page().runJavaScript("document.getElementById('ctl00_btnNext2').click();")
except:
self.page().toHtml(self.callable)
def parse(self, data):
soup = BeautifulSoup(data, 'html.parser')
for news in soup.find_all(class_ = 'news'):
print(news.text)
def callable(self, data):
self.app.quit()
url = "http://www3.hkexnews.hk/listedco/listconews/advancedsearch/search_active_main.aspx"
web = Render(url)
Thank you, Dan. Selenium can solve the problem. But I want to know how to deal with javascript webpage like this one usingrequests
. Any other faster alternatives?
– Chan
Nov 26 '18 at 1:30
This particular web page has a VIEWSTATE token generated by JavaScript and an encrypted version of it also generated by JavaScript. Without actually running JavaScript is is virtually impossible to recreate these tokens. There is no way to do this with requests and I'm not sure how you would run the require JavaScript with Requests-HTML. If you don't like the Selenium option try the PyQt5 solution I gave in the answer.
– Dan-Dev
Nov 26 '18 at 11:50
How to read the next page?
– Chan
Feb 12 at 4:03
Updated the post with how to read the next pages
– Dan-Dev
Feb 12 at 20:41
add a comment |
You have multiple options:
1) You can use Selenium. First install Selenium.
sudo pip3 install selenium
Then get a driver https://sites.google.com/a/chromium.org/chromedriver/downloads (Depending upon your OS you may need to specify the location of your driver)
from selenium import webdriver
from bs4 import BeautifulSoup
import time
browser = webdriver.Chrome()
url = "http://www3.hkexnews.hk/listedco/listconews/advancedsearch/search_active_main.aspx"
browser.get(url)
element = browser.find_element_by_id('ctl00_txt_stock_code') # find the text box
time.sleep(2)
element.send_keys('5') # populate the text box
time.sleep(2)
element.submit() # submit the form
soup = BeautifulSoup(browser.page_source, 'html.parser')
browser.quit()
for news in soup.find_all(class_='news'):
print(news.text)
2) Or use PyQt with QWebEngineView.
Install PyQt on Ubuntu:
sudo apt-get install python3-pyqt5
sudo apt-get install python3-pyqt5.qtwebengine
or on other OS (64 bit versions of Python)
pip3 install PyQt5
Basically you load the first page with the form on. Fill in the form by running JavaScript then submit it. The loadFinished() signal is called twice, the second time because you submitted the form so you can use an if statement to differentiate between the calls.
import sys
from PyQt5.QtWidgets import QApplication
from PyQt5.QtCore import QUrl
from PyQt5.QtWebEngineWidgets import QWebEngineView
from bs4 import BeautifulSoup
class Render(QWebEngineView):
def __init__(self, url):
self.html = None
self.first_pass = True
self.app = QApplication(sys.argv)
QWebEngineView.__init__(self)
self.loadFinished.connect(self._load_finished)
self.load(QUrl(url))
self.app.exec_()
def _load_finished(self, result):
if self.first_pass:
self._first_finished()
self.first_pass = False
else:
self._second_finished()
def _first_finished(self):
self.page().runJavaScript("document.getElementById('ctl00_txt_stock_code').value = '5';")
self.page().runJavaScript("document.getElementById('ctl00_sel_DateOfReleaseFrom_y').value='1999';")
self.page().runJavaScript("preprocessMainForm();")
self.page().runJavaScript("document.forms[0].submit();")
def _second_finished(self):
self.page().toHtml(self.callable)
def callable(self, data):
self.html = data
self.app.quit()
url = "http://www3.hkexnews.hk/listedco/listconews/advancedsearch/search_active_main.aspx"
web = Render(url)
soup = BeautifulSoup(web.html, 'html.parser')
for news in soup.find_all(class_ = 'news'):
print(news.text)
Outputs:
Voting Rights and Capital
Next Day Disclosure Return
NOTICE OF REDEMPTION AND CANCELLATION OF LISTING
THIRD INTERIM DIVIDEND FOR 2018
Notification of Transactions by Persons Discharging Managerial Responsibilities
Next Day Disclosure Return
THIRD INTERIM DIVIDEND FOR 2018
Monthly Return of Equity Issuer on Movements in Securities for the month ended 31 October 2018
Voting Rights and Capital
PUBLICATION OF BASE PROSPECTUS SUPPLEMENT
3Q 2018 EARNINGS RELEASE AUDIO WEBCAST AND CONFERENCE CALL
3Q EARNINGS RELEASE - HIGHLIGHTS
Scrip Dividend Circular
2018 Third Interim Dividend; Scrip Dividend
THIRD INTERIM DIVIDEND FOR 2018 SCRIP DIVIDEND ALTERNATIVE
NOTIFICATION OF MAJOR HOLDINGS
EARNINGS RELEASE FOR THIRD QUARTER 2018
NOTIFICATION OF MAJOR HOLDINGS
Monthly Return of Equity Issuer on Movements in Securities for the month ended 30 September 2018
THIRD INTERIM DIVIDEND FOR 2018; DIVIDEND ON PREFERENCE SHARES
Alternatively you can use Scrapy splash https://github.com/scrapy-plugins/scrapy-splash
Or Requests-HTML https://html.python-requests.org/ .
But I am not sure how you would fill the form in using these two last approaches.
Updated how to read the next pages:
import sys
from PyQt5.QtWidgets import QApplication
from PyQt5.QtCore import QUrl
from PyQt5.QtWebEngineWidgets import QWebEngineView
from bs4 import BeautifulSoup
class Render(QWebEngineView):
def __init__(self, url):
self.html = None
self.count = 0
self.first_pass = True
self.app = QApplication(sys.argv)
QWebEngineView.__init__(self)
self.loadFinished.connect(self._load_finished)
self.load(QUrl(url))
self.app.exec_()
def _load_finished(self, result):
if self.first_pass:
self._first_finished()
self.first_pass = False
else:
self._second_finished()
def _first_finished(self):
self.page().runJavaScript("document.getElementById('ctl00_txt_stock_code').value = '5';")
self.page().runJavaScript("document.getElementById('ctl00_sel_DateOfReleaseFrom_y').value='1999';")
self.page().runJavaScript("preprocessMainForm();")
self.page().runJavaScript("document.forms[0].submit();")
def _second_finished(self):
try:
self.page().toHtml(self.parse)
self.count += 1
if self.count > 5:
self.page().toHtml(self.callable)
else:
self.page().runJavaScript("document.getElementById('ctl00_btnNext2').click();")
except:
self.page().toHtml(self.callable)
def parse(self, data):
soup = BeautifulSoup(data, 'html.parser')
for news in soup.find_all(class_ = 'news'):
print(news.text)
def callable(self, data):
self.app.quit()
url = "http://www3.hkexnews.hk/listedco/listconews/advancedsearch/search_active_main.aspx"
web = Render(url)
Thank you, Dan. Selenium can solve the problem. But I want to know how to deal with javascript webpage like this one usingrequests
. Any other faster alternatives?
– Chan
Nov 26 '18 at 1:30
This particular web page has a VIEWSTATE token generated by JavaScript and an encrypted version of it also generated by JavaScript. Without actually running JavaScript is is virtually impossible to recreate these tokens. There is no way to do this with requests and I'm not sure how you would run the require JavaScript with Requests-HTML. If you don't like the Selenium option try the PyQt5 solution I gave in the answer.
– Dan-Dev
Nov 26 '18 at 11:50
How to read the next page?
– Chan
Feb 12 at 4:03
Updated the post with how to read the next pages
– Dan-Dev
Feb 12 at 20:41
add a comment |
You have multiple options:
1) You can use Selenium. First install Selenium.
sudo pip3 install selenium
Then get a driver https://sites.google.com/a/chromium.org/chromedriver/downloads (Depending upon your OS you may need to specify the location of your driver)
from selenium import webdriver
from bs4 import BeautifulSoup
import time
browser = webdriver.Chrome()
url = "http://www3.hkexnews.hk/listedco/listconews/advancedsearch/search_active_main.aspx"
browser.get(url)
element = browser.find_element_by_id('ctl00_txt_stock_code') # find the text box
time.sleep(2)
element.send_keys('5') # populate the text box
time.sleep(2)
element.submit() # submit the form
soup = BeautifulSoup(browser.page_source, 'html.parser')
browser.quit()
for news in soup.find_all(class_='news'):
print(news.text)
2) Or use PyQt with QWebEngineView.
Install PyQt on Ubuntu:
sudo apt-get install python3-pyqt5
sudo apt-get install python3-pyqt5.qtwebengine
or on other OS (64 bit versions of Python)
pip3 install PyQt5
Basically you load the first page with the form on. Fill in the form by running JavaScript then submit it. The loadFinished() signal is called twice, the second time because you submitted the form so you can use an if statement to differentiate between the calls.
import sys
from PyQt5.QtWidgets import QApplication
from PyQt5.QtCore import QUrl
from PyQt5.QtWebEngineWidgets import QWebEngineView
from bs4 import BeautifulSoup
class Render(QWebEngineView):
def __init__(self, url):
self.html = None
self.first_pass = True
self.app = QApplication(sys.argv)
QWebEngineView.__init__(self)
self.loadFinished.connect(self._load_finished)
self.load(QUrl(url))
self.app.exec_()
def _load_finished(self, result):
if self.first_pass:
self._first_finished()
self.first_pass = False
else:
self._second_finished()
def _first_finished(self):
self.page().runJavaScript("document.getElementById('ctl00_txt_stock_code').value = '5';")
self.page().runJavaScript("document.getElementById('ctl00_sel_DateOfReleaseFrom_y').value='1999';")
self.page().runJavaScript("preprocessMainForm();")
self.page().runJavaScript("document.forms[0].submit();")
def _second_finished(self):
self.page().toHtml(self.callable)
def callable(self, data):
self.html = data
self.app.quit()
url = "http://www3.hkexnews.hk/listedco/listconews/advancedsearch/search_active_main.aspx"
web = Render(url)
soup = BeautifulSoup(web.html, 'html.parser')
for news in soup.find_all(class_ = 'news'):
print(news.text)
Outputs:
Voting Rights and Capital
Next Day Disclosure Return
NOTICE OF REDEMPTION AND CANCELLATION OF LISTING
THIRD INTERIM DIVIDEND FOR 2018
Notification of Transactions by Persons Discharging Managerial Responsibilities
Next Day Disclosure Return
THIRD INTERIM DIVIDEND FOR 2018
Monthly Return of Equity Issuer on Movements in Securities for the month ended 31 October 2018
Voting Rights and Capital
PUBLICATION OF BASE PROSPECTUS SUPPLEMENT
3Q 2018 EARNINGS RELEASE AUDIO WEBCAST AND CONFERENCE CALL
3Q EARNINGS RELEASE - HIGHLIGHTS
Scrip Dividend Circular
2018 Third Interim Dividend; Scrip Dividend
THIRD INTERIM DIVIDEND FOR 2018 SCRIP DIVIDEND ALTERNATIVE
NOTIFICATION OF MAJOR HOLDINGS
EARNINGS RELEASE FOR THIRD QUARTER 2018
NOTIFICATION OF MAJOR HOLDINGS
Monthly Return of Equity Issuer on Movements in Securities for the month ended 30 September 2018
THIRD INTERIM DIVIDEND FOR 2018; DIVIDEND ON PREFERENCE SHARES
Alternatively you can use Scrapy splash https://github.com/scrapy-plugins/scrapy-splash
Or Requests-HTML https://html.python-requests.org/ .
But I am not sure how you would fill the form in using these two last approaches.
Updated how to read the next pages:
import sys
from PyQt5.QtWidgets import QApplication
from PyQt5.QtCore import QUrl
from PyQt5.QtWebEngineWidgets import QWebEngineView
from bs4 import BeautifulSoup
class Render(QWebEngineView):
def __init__(self, url):
self.html = None
self.count = 0
self.first_pass = True
self.app = QApplication(sys.argv)
QWebEngineView.__init__(self)
self.loadFinished.connect(self._load_finished)
self.load(QUrl(url))
self.app.exec_()
def _load_finished(self, result):
if self.first_pass:
self._first_finished()
self.first_pass = False
else:
self._second_finished()
def _first_finished(self):
self.page().runJavaScript("document.getElementById('ctl00_txt_stock_code').value = '5';")
self.page().runJavaScript("document.getElementById('ctl00_sel_DateOfReleaseFrom_y').value='1999';")
self.page().runJavaScript("preprocessMainForm();")
self.page().runJavaScript("document.forms[0].submit();")
def _second_finished(self):
try:
self.page().toHtml(self.parse)
self.count += 1
if self.count > 5:
self.page().toHtml(self.callable)
else:
self.page().runJavaScript("document.getElementById('ctl00_btnNext2').click();")
except:
self.page().toHtml(self.callable)
def parse(self, data):
soup = BeautifulSoup(data, 'html.parser')
for news in soup.find_all(class_ = 'news'):
print(news.text)
def callable(self, data):
self.app.quit()
url = "http://www3.hkexnews.hk/listedco/listconews/advancedsearch/search_active_main.aspx"
web = Render(url)
You have multiple options:
1) You can use Selenium. First install Selenium.
sudo pip3 install selenium
Then get a driver https://sites.google.com/a/chromium.org/chromedriver/downloads (Depending upon your OS you may need to specify the location of your driver)
from selenium import webdriver
from bs4 import BeautifulSoup
import time
browser = webdriver.Chrome()
url = "http://www3.hkexnews.hk/listedco/listconews/advancedsearch/search_active_main.aspx"
browser.get(url)
element = browser.find_element_by_id('ctl00_txt_stock_code') # find the text box
time.sleep(2)
element.send_keys('5') # populate the text box
time.sleep(2)
element.submit() # submit the form
soup = BeautifulSoup(browser.page_source, 'html.parser')
browser.quit()
for news in soup.find_all(class_='news'):
print(news.text)
2) Or use PyQt with QWebEngineView.
Install PyQt on Ubuntu:
sudo apt-get install python3-pyqt5
sudo apt-get install python3-pyqt5.qtwebengine
or on other OS (64 bit versions of Python)
pip3 install PyQt5
Basically you load the first page with the form on. Fill in the form by running JavaScript then submit it. The loadFinished() signal is called twice, the second time because you submitted the form so you can use an if statement to differentiate between the calls.
import sys
from PyQt5.QtWidgets import QApplication
from PyQt5.QtCore import QUrl
from PyQt5.QtWebEngineWidgets import QWebEngineView
from bs4 import BeautifulSoup
class Render(QWebEngineView):
def __init__(self, url):
self.html = None
self.first_pass = True
self.app = QApplication(sys.argv)
QWebEngineView.__init__(self)
self.loadFinished.connect(self._load_finished)
self.load(QUrl(url))
self.app.exec_()
def _load_finished(self, result):
if self.first_pass:
self._first_finished()
self.first_pass = False
else:
self._second_finished()
def _first_finished(self):
self.page().runJavaScript("document.getElementById('ctl00_txt_stock_code').value = '5';")
self.page().runJavaScript("document.getElementById('ctl00_sel_DateOfReleaseFrom_y').value='1999';")
self.page().runJavaScript("preprocessMainForm();")
self.page().runJavaScript("document.forms[0].submit();")
def _second_finished(self):
self.page().toHtml(self.callable)
def callable(self, data):
self.html = data
self.app.quit()
url = "http://www3.hkexnews.hk/listedco/listconews/advancedsearch/search_active_main.aspx"
web = Render(url)
soup = BeautifulSoup(web.html, 'html.parser')
for news in soup.find_all(class_ = 'news'):
print(news.text)
Outputs:
Voting Rights and Capital
Next Day Disclosure Return
NOTICE OF REDEMPTION AND CANCELLATION OF LISTING
THIRD INTERIM DIVIDEND FOR 2018
Notification of Transactions by Persons Discharging Managerial Responsibilities
Next Day Disclosure Return
THIRD INTERIM DIVIDEND FOR 2018
Monthly Return of Equity Issuer on Movements in Securities for the month ended 31 October 2018
Voting Rights and Capital
PUBLICATION OF BASE PROSPECTUS SUPPLEMENT
3Q 2018 EARNINGS RELEASE AUDIO WEBCAST AND CONFERENCE CALL
3Q EARNINGS RELEASE - HIGHLIGHTS
Scrip Dividend Circular
2018 Third Interim Dividend; Scrip Dividend
THIRD INTERIM DIVIDEND FOR 2018 SCRIP DIVIDEND ALTERNATIVE
NOTIFICATION OF MAJOR HOLDINGS
EARNINGS RELEASE FOR THIRD QUARTER 2018
NOTIFICATION OF MAJOR HOLDINGS
Monthly Return of Equity Issuer on Movements in Securities for the month ended 30 September 2018
THIRD INTERIM DIVIDEND FOR 2018; DIVIDEND ON PREFERENCE SHARES
Alternatively you can use Scrapy splash https://github.com/scrapy-plugins/scrapy-splash
Or Requests-HTML https://html.python-requests.org/ .
But I am not sure how you would fill the form in using these two last approaches.
Updated how to read the next pages:
import sys
from PyQt5.QtWidgets import QApplication
from PyQt5.QtCore import QUrl
from PyQt5.QtWebEngineWidgets import QWebEngineView
from bs4 import BeautifulSoup
class Render(QWebEngineView):
def __init__(self, url):
self.html = None
self.count = 0
self.first_pass = True
self.app = QApplication(sys.argv)
QWebEngineView.__init__(self)
self.loadFinished.connect(self._load_finished)
self.load(QUrl(url))
self.app.exec_()
def _load_finished(self, result):
if self.first_pass:
self._first_finished()
self.first_pass = False
else:
self._second_finished()
def _first_finished(self):
self.page().runJavaScript("document.getElementById('ctl00_txt_stock_code').value = '5';")
self.page().runJavaScript("document.getElementById('ctl00_sel_DateOfReleaseFrom_y').value='1999';")
self.page().runJavaScript("preprocessMainForm();")
self.page().runJavaScript("document.forms[0].submit();")
def _second_finished(self):
try:
self.page().toHtml(self.parse)
self.count += 1
if self.count > 5:
self.page().toHtml(self.callable)
else:
self.page().runJavaScript("document.getElementById('ctl00_btnNext2').click();")
except:
self.page().toHtml(self.callable)
def parse(self, data):
soup = BeautifulSoup(data, 'html.parser')
for news in soup.find_all(class_ = 'news'):
print(news.text)
def callable(self, data):
self.app.quit()
url = "http://www3.hkexnews.hk/listedco/listconews/advancedsearch/search_active_main.aspx"
web = Render(url)
edited Feb 12 at 20:41
answered Nov 24 '18 at 2:35
Dan-DevDan-Dev
4,87822033
4,87822033
Thank you, Dan. Selenium can solve the problem. But I want to know how to deal with javascript webpage like this one usingrequests
. Any other faster alternatives?
– Chan
Nov 26 '18 at 1:30
This particular web page has a VIEWSTATE token generated by JavaScript and an encrypted version of it also generated by JavaScript. Without actually running JavaScript is is virtually impossible to recreate these tokens. There is no way to do this with requests and I'm not sure how you would run the require JavaScript with Requests-HTML. If you don't like the Selenium option try the PyQt5 solution I gave in the answer.
– Dan-Dev
Nov 26 '18 at 11:50
How to read the next page?
– Chan
Feb 12 at 4:03
Updated the post with how to read the next pages
– Dan-Dev
Feb 12 at 20:41
add a comment |
Thank you, Dan. Selenium can solve the problem. But I want to know how to deal with javascript webpage like this one usingrequests
. Any other faster alternatives?
– Chan
Nov 26 '18 at 1:30
This particular web page has a VIEWSTATE token generated by JavaScript and an encrypted version of it also generated by JavaScript. Without actually running JavaScript is is virtually impossible to recreate these tokens. There is no way to do this with requests and I'm not sure how you would run the require JavaScript with Requests-HTML. If you don't like the Selenium option try the PyQt5 solution I gave in the answer.
– Dan-Dev
Nov 26 '18 at 11:50
How to read the next page?
– Chan
Feb 12 at 4:03
Updated the post with how to read the next pages
– Dan-Dev
Feb 12 at 20:41
Thank you, Dan. Selenium can solve the problem. But I want to know how to deal with javascript webpage like this one using
requests
. Any other faster alternatives?– Chan
Nov 26 '18 at 1:30
Thank you, Dan. Selenium can solve the problem. But I want to know how to deal with javascript webpage like this one using
requests
. Any other faster alternatives?– Chan
Nov 26 '18 at 1:30
This particular web page has a VIEWSTATE token generated by JavaScript and an encrypted version of it also generated by JavaScript. Without actually running JavaScript is is virtually impossible to recreate these tokens. There is no way to do this with requests and I'm not sure how you would run the require JavaScript with Requests-HTML. If you don't like the Selenium option try the PyQt5 solution I gave in the answer.
– Dan-Dev
Nov 26 '18 at 11:50
This particular web page has a VIEWSTATE token generated by JavaScript and an encrypted version of it also generated by JavaScript. Without actually running JavaScript is is virtually impossible to recreate these tokens. There is no way to do this with requests and I'm not sure how you would run the require JavaScript with Requests-HTML. If you don't like the Selenium option try the PyQt5 solution I gave in the answer.
– Dan-Dev
Nov 26 '18 at 11:50
How to read the next page?
– Chan
Feb 12 at 4:03
How to read the next page?
– Chan
Feb 12 at 4:03
Updated the post with how to read the next pages
– Dan-Dev
Feb 12 at 20:41
Updated the post with how to read the next pages
– Dan-Dev
Feb 12 at 20:41
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53429181%2fhow-to-get-the-web-page-using-requests-post%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Do you want to simulate POST request that sends after you enter
5
to "Stock code" field and press "Search"?– Andersson
Nov 22 '18 at 10:48
Yes, you are right.
– Chan
Nov 22 '18 at 10:51
looks site has problems when clicking search
– mirhossein
Nov 22 '18 at 17:16
The website works. After inputting 5 in the field of stock code, then press
search
, you can view the page that shows the results.– Chan
Nov 23 '18 at 1:09
Can anyone help?
– Chan
Nov 23 '18 at 3:46