BeautifulSoup does not pick up individual tags
I'm trying to set up a web scraper for the following page:
https://www.autozone.com/external-engine/oil-filter?pageNumber=1
#connect and download html
data = 'https://www.autozone.com/motor-oil-and-transmission-fluid/engine-oil?pageNumber=1'
uclient = urlopen(data)
pagehtml= uclient.read()
uclient.close()
articles = bs(pagehtml,'html.parser')
#separate data by shop items
containers = articles.find_all('div',{'class' : 'shelfItem'})
However, when I try to grab the price, nothing is found:
containers[0].find_all('div',{'class':'price'})
...while inspecting the website with my browser shows the following:
<div class="price" id="retailpricediv_663653_0" style="height: 85px;">Price: <strong>$8.99</strong><br>
How can I grab that $8.99?
Thanks
python web-scraping beautifulsoup jupyter-notebook
add a comment |
I'm trying to set up a web scraper for the following page:
https://www.autozone.com/external-engine/oil-filter?pageNumber=1
#connect and download html
data = 'https://www.autozone.com/motor-oil-and-transmission-fluid/engine-oil?pageNumber=1'
uclient = urlopen(data)
pagehtml= uclient.read()
uclient.close()
articles = bs(pagehtml,'html.parser')
#separate data by shop items
containers = articles.find_all('div',{'class' : 'shelfItem'})
However, when I try to grab the price, nothing is found:
containers[0].find_all('div',{'class':'price'})
...while inspecting the website with my browser shows the following:
<div class="price" id="retailpricediv_663653_0" style="height: 85px;">Price: <strong>$8.99</strong><br>
How can I grab that $8.99?
Thanks
python web-scraping beautifulsoup jupyter-notebook
From having a quick look at the source, and fiddling in the console, although there are divs with a class ofprice
, none are inside a div with classshelfItem
(although the latter do exist). It appears that you wantshelfItemDetails
instead of simplyshelfItem
– Robin Zigmond
Nov 19 '18 at 20:57
Thanks for the response. I tried both, but in either case, I just get an empty div (e.g. id="retailpricediv_663650_2" for the first article), with no price inside
– Berbatov
Nov 19 '18 at 21:04
add a comment |
I'm trying to set up a web scraper for the following page:
https://www.autozone.com/external-engine/oil-filter?pageNumber=1
#connect and download html
data = 'https://www.autozone.com/motor-oil-and-transmission-fluid/engine-oil?pageNumber=1'
uclient = urlopen(data)
pagehtml= uclient.read()
uclient.close()
articles = bs(pagehtml,'html.parser')
#separate data by shop items
containers = articles.find_all('div',{'class' : 'shelfItem'})
However, when I try to grab the price, nothing is found:
containers[0].find_all('div',{'class':'price'})
...while inspecting the website with my browser shows the following:
<div class="price" id="retailpricediv_663653_0" style="height: 85px;">Price: <strong>$8.99</strong><br>
How can I grab that $8.99?
Thanks
python web-scraping beautifulsoup jupyter-notebook
I'm trying to set up a web scraper for the following page:
https://www.autozone.com/external-engine/oil-filter?pageNumber=1
#connect and download html
data = 'https://www.autozone.com/motor-oil-and-transmission-fluid/engine-oil?pageNumber=1'
uclient = urlopen(data)
pagehtml= uclient.read()
uclient.close()
articles = bs(pagehtml,'html.parser')
#separate data by shop items
containers = articles.find_all('div',{'class' : 'shelfItem'})
However, when I try to grab the price, nothing is found:
containers[0].find_all('div',{'class':'price'})
...while inspecting the website with my browser shows the following:
<div class="price" id="retailpricediv_663653_0" style="height: 85px;">Price: <strong>$8.99</strong><br>
How can I grab that $8.99?
Thanks
python web-scraping beautifulsoup jupyter-notebook
python web-scraping beautifulsoup jupyter-notebook
asked Nov 19 '18 at 20:45
BerbatovBerbatov
1321111
1321111
From having a quick look at the source, and fiddling in the console, although there are divs with a class ofprice
, none are inside a div with classshelfItem
(although the latter do exist). It appears that you wantshelfItemDetails
instead of simplyshelfItem
– Robin Zigmond
Nov 19 '18 at 20:57
Thanks for the response. I tried both, but in either case, I just get an empty div (e.g. id="retailpricediv_663650_2" for the first article), with no price inside
– Berbatov
Nov 19 '18 at 21:04
add a comment |
From having a quick look at the source, and fiddling in the console, although there are divs with a class ofprice
, none are inside a div with classshelfItem
(although the latter do exist). It appears that you wantshelfItemDetails
instead of simplyshelfItem
– Robin Zigmond
Nov 19 '18 at 20:57
Thanks for the response. I tried both, but in either case, I just get an empty div (e.g. id="retailpricediv_663650_2" for the first article), with no price inside
– Berbatov
Nov 19 '18 at 21:04
From having a quick look at the source, and fiddling in the console, although there are divs with a class of
price
, none are inside a div with class shelfItem
(although the latter do exist). It appears that you want shelfItemDetails
instead of simply shelfItem
– Robin Zigmond
Nov 19 '18 at 20:57
From having a quick look at the source, and fiddling in the console, although there are divs with a class of
price
, none are inside a div with class shelfItem
(although the latter do exist). It appears that you want shelfItemDetails
instead of simply shelfItem
– Robin Zigmond
Nov 19 '18 at 20:57
Thanks for the response. I tried both, but in either case, I just get an empty div (e.g. id="retailpricediv_663650_2" for the first article), with no price inside
– Berbatov
Nov 19 '18 at 21:04
Thanks for the response. I tried both, but in either case, I just get an empty div (e.g. id="retailpricediv_663650_2" for the first article), with no price inside
– Berbatov
Nov 19 '18 at 21:04
add a comment |
3 Answers
3
active
oldest
votes
I think the prices are loaded by javascript so will need a method like selenium to ensure values present (or API call as shown in other answer!)
from selenium import webdriver
import pandas as pd
driver = webdriver.Chrome()
driver.get("https://www.autozone.com/motor-oil-and-transmission-fluid/engine-oil?pageNumber=1")
products = driver.find_elements_by_css_selector('.prodName')
prices = driver.find_elements_by_css_selector('.price[id*=retailpricediv]')
productList =
priceList =
for product, price in zip(products,prices):
productList.append(product.text)
priceList.append(price.text.split('n')[0].replace('Price: ',''))
df = pd.DataFrame({'Product':productList,'Price':priceList})
print(df)
driver.quit()
add a comment |
You can get required data prices by direct call to api:
import requests
url = 'https://www.autozone.com/rest/bean/autozone/diy/commerce/pricing/PricingServices/retrievePriceAndAvailability?atg-rest-depth=2'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0'}
data = {'arg1': 6997, 'arg2':'', 'arg3': '663653,663636,663650,5531,663637,663639,644036,663658,663641,835241,663645,663642', 'arg4': ''}
response = requests.post(url, headers=headers, data=data).json()
for item in response['atgResponse']:
print(item['retailPrice'])
Output:
8.99
8.99
10.99
8.99
8.99
8.99
8.99
8.99
8.99
8.99
8.99
8.99
To create data
dict you need to pass store number as arg1
and list of each item id as arg3
...
You can get arg1
value once, but arg3
should be extracted on each page
page_url = 'https://www.autozone.com/external-engine/oil-filter?pageNumber=1'
r = requests.get(page_url, headers=headers)
source = bs(r.text)
arg1 = source.find('div',{'id' : 'myStoreNum'}).text
arg3 = ",".join([_id['id'].strip('azid') for _id in source.find_all('div',{'class' : 'categorizedShelfItem'})])
so now you can define data
without hardcoding values:
data = {'arg1': arg1, 'arg2':'', 'arg3': arg3, 'arg4': ''}
To get values from next page just change pageNumber=1
to pageNumber=2
in page_url
- the rest code remains the same...
add a comment |
You can peel the same apple in different ways. Here is another approach using selenium:
from selenium import webdriver
from contextlib import closing
with closing(webdriver.Chrome()) as driver:
driver.get("https://www.autozone.com/external-engine/oil-filter?pageNumber=1")
for items in driver.find_elements_by_css_selector("[typeof='Product']"):
price = items.find_element_by_css_selector('.price > strong').text
print(price)
Output:
$8.99
$8.99
$10.99
$8.99
$8.99
and so on ....
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53382377%2fbeautifulsoup-does-not-pick-up-individual-tags%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
I think the prices are loaded by javascript so will need a method like selenium to ensure values present (or API call as shown in other answer!)
from selenium import webdriver
import pandas as pd
driver = webdriver.Chrome()
driver.get("https://www.autozone.com/motor-oil-and-transmission-fluid/engine-oil?pageNumber=1")
products = driver.find_elements_by_css_selector('.prodName')
prices = driver.find_elements_by_css_selector('.price[id*=retailpricediv]')
productList =
priceList =
for product, price in zip(products,prices):
productList.append(product.text)
priceList.append(price.text.split('n')[0].replace('Price: ',''))
df = pd.DataFrame({'Product':productList,'Price':priceList})
print(df)
driver.quit()
add a comment |
I think the prices are loaded by javascript so will need a method like selenium to ensure values present (or API call as shown in other answer!)
from selenium import webdriver
import pandas as pd
driver = webdriver.Chrome()
driver.get("https://www.autozone.com/motor-oil-and-transmission-fluid/engine-oil?pageNumber=1")
products = driver.find_elements_by_css_selector('.prodName')
prices = driver.find_elements_by_css_selector('.price[id*=retailpricediv]')
productList =
priceList =
for product, price in zip(products,prices):
productList.append(product.text)
priceList.append(price.text.split('n')[0].replace('Price: ',''))
df = pd.DataFrame({'Product':productList,'Price':priceList})
print(df)
driver.quit()
add a comment |
I think the prices are loaded by javascript so will need a method like selenium to ensure values present (or API call as shown in other answer!)
from selenium import webdriver
import pandas as pd
driver = webdriver.Chrome()
driver.get("https://www.autozone.com/motor-oil-and-transmission-fluid/engine-oil?pageNumber=1")
products = driver.find_elements_by_css_selector('.prodName')
prices = driver.find_elements_by_css_selector('.price[id*=retailpricediv]')
productList =
priceList =
for product, price in zip(products,prices):
productList.append(product.text)
priceList.append(price.text.split('n')[0].replace('Price: ',''))
df = pd.DataFrame({'Product':productList,'Price':priceList})
print(df)
driver.quit()
I think the prices are loaded by javascript so will need a method like selenium to ensure values present (or API call as shown in other answer!)
from selenium import webdriver
import pandas as pd
driver = webdriver.Chrome()
driver.get("https://www.autozone.com/motor-oil-and-transmission-fluid/engine-oil?pageNumber=1")
products = driver.find_elements_by_css_selector('.prodName')
prices = driver.find_elements_by_css_selector('.price[id*=retailpricediv]')
productList =
priceList =
for product, price in zip(products,prices):
productList.append(product.text)
priceList.append(price.text.split('n')[0].replace('Price: ',''))
df = pd.DataFrame({'Product':productList,'Price':priceList})
print(df)
driver.quit()
edited Nov 19 '18 at 21:58
answered Nov 19 '18 at 21:29


QHarrQHarr
30.9k81941
30.9k81941
add a comment |
add a comment |
You can get required data prices by direct call to api:
import requests
url = 'https://www.autozone.com/rest/bean/autozone/diy/commerce/pricing/PricingServices/retrievePriceAndAvailability?atg-rest-depth=2'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0'}
data = {'arg1': 6997, 'arg2':'', 'arg3': '663653,663636,663650,5531,663637,663639,644036,663658,663641,835241,663645,663642', 'arg4': ''}
response = requests.post(url, headers=headers, data=data).json()
for item in response['atgResponse']:
print(item['retailPrice'])
Output:
8.99
8.99
10.99
8.99
8.99
8.99
8.99
8.99
8.99
8.99
8.99
8.99
To create data
dict you need to pass store number as arg1
and list of each item id as arg3
...
You can get arg1
value once, but arg3
should be extracted on each page
page_url = 'https://www.autozone.com/external-engine/oil-filter?pageNumber=1'
r = requests.get(page_url, headers=headers)
source = bs(r.text)
arg1 = source.find('div',{'id' : 'myStoreNum'}).text
arg3 = ",".join([_id['id'].strip('azid') for _id in source.find_all('div',{'class' : 'categorizedShelfItem'})])
so now you can define data
without hardcoding values:
data = {'arg1': arg1, 'arg2':'', 'arg3': arg3, 'arg4': ''}
To get values from next page just change pageNumber=1
to pageNumber=2
in page_url
- the rest code remains the same...
add a comment |
You can get required data prices by direct call to api:
import requests
url = 'https://www.autozone.com/rest/bean/autozone/diy/commerce/pricing/PricingServices/retrievePriceAndAvailability?atg-rest-depth=2'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0'}
data = {'arg1': 6997, 'arg2':'', 'arg3': '663653,663636,663650,5531,663637,663639,644036,663658,663641,835241,663645,663642', 'arg4': ''}
response = requests.post(url, headers=headers, data=data).json()
for item in response['atgResponse']:
print(item['retailPrice'])
Output:
8.99
8.99
10.99
8.99
8.99
8.99
8.99
8.99
8.99
8.99
8.99
8.99
To create data
dict you need to pass store number as arg1
and list of each item id as arg3
...
You can get arg1
value once, but arg3
should be extracted on each page
page_url = 'https://www.autozone.com/external-engine/oil-filter?pageNumber=1'
r = requests.get(page_url, headers=headers)
source = bs(r.text)
arg1 = source.find('div',{'id' : 'myStoreNum'}).text
arg3 = ",".join([_id['id'].strip('azid') for _id in source.find_all('div',{'class' : 'categorizedShelfItem'})])
so now you can define data
without hardcoding values:
data = {'arg1': arg1, 'arg2':'', 'arg3': arg3, 'arg4': ''}
To get values from next page just change pageNumber=1
to pageNumber=2
in page_url
- the rest code remains the same...
add a comment |
You can get required data prices by direct call to api:
import requests
url = 'https://www.autozone.com/rest/bean/autozone/diy/commerce/pricing/PricingServices/retrievePriceAndAvailability?atg-rest-depth=2'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0'}
data = {'arg1': 6997, 'arg2':'', 'arg3': '663653,663636,663650,5531,663637,663639,644036,663658,663641,835241,663645,663642', 'arg4': ''}
response = requests.post(url, headers=headers, data=data).json()
for item in response['atgResponse']:
print(item['retailPrice'])
Output:
8.99
8.99
10.99
8.99
8.99
8.99
8.99
8.99
8.99
8.99
8.99
8.99
To create data
dict you need to pass store number as arg1
and list of each item id as arg3
...
You can get arg1
value once, but arg3
should be extracted on each page
page_url = 'https://www.autozone.com/external-engine/oil-filter?pageNumber=1'
r = requests.get(page_url, headers=headers)
source = bs(r.text)
arg1 = source.find('div',{'id' : 'myStoreNum'}).text
arg3 = ",".join([_id['id'].strip('azid') for _id in source.find_all('div',{'class' : 'categorizedShelfItem'})])
so now you can define data
without hardcoding values:
data = {'arg1': arg1, 'arg2':'', 'arg3': arg3, 'arg4': ''}
To get values from next page just change pageNumber=1
to pageNumber=2
in page_url
- the rest code remains the same...
You can get required data prices by direct call to api:
import requests
url = 'https://www.autozone.com/rest/bean/autozone/diy/commerce/pricing/PricingServices/retrievePriceAndAvailability?atg-rest-depth=2'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0'}
data = {'arg1': 6997, 'arg2':'', 'arg3': '663653,663636,663650,5531,663637,663639,644036,663658,663641,835241,663645,663642', 'arg4': ''}
response = requests.post(url, headers=headers, data=data).json()
for item in response['atgResponse']:
print(item['retailPrice'])
Output:
8.99
8.99
10.99
8.99
8.99
8.99
8.99
8.99
8.99
8.99
8.99
8.99
To create data
dict you need to pass store number as arg1
and list of each item id as arg3
...
You can get arg1
value once, but arg3
should be extracted on each page
page_url = 'https://www.autozone.com/external-engine/oil-filter?pageNumber=1'
r = requests.get(page_url, headers=headers)
source = bs(r.text)
arg1 = source.find('div',{'id' : 'myStoreNum'}).text
arg3 = ",".join([_id['id'].strip('azid') for _id in source.find_all('div',{'class' : 'categorizedShelfItem'})])
so now you can define data
without hardcoding values:
data = {'arg1': arg1, 'arg2':'', 'arg3': arg3, 'arg4': ''}
To get values from next page just change pageNumber=1
to pageNumber=2
in page_url
- the rest code remains the same...
edited Nov 19 '18 at 22:27
answered Nov 19 '18 at 21:27


AnderssonAndersson
37.6k103266
37.6k103266
add a comment |
add a comment |
You can peel the same apple in different ways. Here is another approach using selenium:
from selenium import webdriver
from contextlib import closing
with closing(webdriver.Chrome()) as driver:
driver.get("https://www.autozone.com/external-engine/oil-filter?pageNumber=1")
for items in driver.find_elements_by_css_selector("[typeof='Product']"):
price = items.find_element_by_css_selector('.price > strong').text
print(price)
Output:
$8.99
$8.99
$10.99
$8.99
$8.99
and so on ....
add a comment |
You can peel the same apple in different ways. Here is another approach using selenium:
from selenium import webdriver
from contextlib import closing
with closing(webdriver.Chrome()) as driver:
driver.get("https://www.autozone.com/external-engine/oil-filter?pageNumber=1")
for items in driver.find_elements_by_css_selector("[typeof='Product']"):
price = items.find_element_by_css_selector('.price > strong').text
print(price)
Output:
$8.99
$8.99
$10.99
$8.99
$8.99
and so on ....
add a comment |
You can peel the same apple in different ways. Here is another approach using selenium:
from selenium import webdriver
from contextlib import closing
with closing(webdriver.Chrome()) as driver:
driver.get("https://www.autozone.com/external-engine/oil-filter?pageNumber=1")
for items in driver.find_elements_by_css_selector("[typeof='Product']"):
price = items.find_element_by_css_selector('.price > strong').text
print(price)
Output:
$8.99
$8.99
$10.99
$8.99
$8.99
and so on ....
You can peel the same apple in different ways. Here is another approach using selenium:
from selenium import webdriver
from contextlib import closing
with closing(webdriver.Chrome()) as driver:
driver.get("https://www.autozone.com/external-engine/oil-filter?pageNumber=1")
for items in driver.find_elements_by_css_selector("[typeof='Product']"):
price = items.find_element_by_css_selector('.price > strong').text
print(price)
Output:
$8.99
$8.99
$10.99
$8.99
$8.99
and so on ....
answered Nov 20 '18 at 6:05


SIMSIM
10.2k3743
10.2k3743
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53382377%2fbeautifulsoup-does-not-pick-up-individual-tags%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
From having a quick look at the source, and fiddling in the console, although there are divs with a class of
price
, none are inside a div with classshelfItem
(although the latter do exist). It appears that you wantshelfItemDetails
instead of simplyshelfItem
– Robin Zigmond
Nov 19 '18 at 20:57
Thanks for the response. I tried both, but in either case, I just get an empty div (e.g. id="retailpricediv_663650_2" for the first article), with no price inside
– Berbatov
Nov 19 '18 at 21:04