BeautifulSoup does not pick up individual tags

I'm trying to set up a web scraper for the following page:
https://www.autozone.com/external-engine/oil-filter?pageNumber=1

#connect and download html

data = 'https://www.autozone.com/motor-oil-and-transmission-fluid/engine-oil?pageNumber=1'

uclient = urlopen(data)

pagehtml= uclient.read()

uclient.close()

articles = bs(pagehtml,'html.parser')



#separate data by shop items

containers = articles.find_all('div',{'class' : 'shelfItem'})

However, when I try to grab the price, nothing is found:

containers[0].find_all('div',{'class':'price'})

...while inspecting the website with my browser shows the following:

<div class="price" id="retailpricediv_663653_0" style="height: 85px;">Price: <strong>$8.99</strong><br>

How can I grab that $8.99?

Thanks

asked Nov 19 '18 at 20:45

Berbatov

1321111

From having a quick look at the source, and fiddling in the console, although there are divs with a class of price, none are inside a div with class shelfItem (although the latter do exist). It appears that you want shelfItemDetails instead of simply shelfItem
– Robin Zigmond
Nov 19 '18 at 20:57

Thanks for the response. I tried both, but in either case, I just get an empty div (e.g. id="retailpricediv_663650_2" for the first article), with no price inside
– Berbatov
Nov 19 '18 at 21:04

add a comment |

I'm trying to set up a web scraper for the following page:
https://www.autozone.com/external-engine/oil-filter?pageNumber=1

#connect and download html

data = 'https://www.autozone.com/motor-oil-and-transmission-fluid/engine-oil?pageNumber=1'

uclient = urlopen(data)

pagehtml= uclient.read()

uclient.close()

articles = bs(pagehtml,'html.parser')



#separate data by shop items

containers = articles.find_all('div',{'class' : 'shelfItem'})

However, when I try to grab the price, nothing is found:

containers[0].find_all('div',{'class':'price'})

...while inspecting the website with my browser shows the following:

<div class="price" id="retailpricediv_663653_0" style="height: 85px;">Price: <strong>$8.99</strong><br>

How can I grab that $8.99?

Thanks

asked Nov 19 '18 at 20:45

Berbatov

1321111

From having a quick look at the source, and fiddling in the console, although there are divs with a class of price, none are inside a div with class shelfItem (although the latter do exist). It appears that you want shelfItemDetails instead of simply shelfItem
– Robin Zigmond
Nov 19 '18 at 20:57

Thanks for the response. I tried both, but in either case, I just get an empty div (e.g. id="retailpricediv_663650_2" for the first article), with no price inside
– Berbatov
Nov 19 '18 at 21:04

add a comment |

I'm trying to set up a web scraper for the following page:
https://www.autozone.com/external-engine/oil-filter?pageNumber=1

#connect and download html

data = 'https://www.autozone.com/motor-oil-and-transmission-fluid/engine-oil?pageNumber=1'

uclient = urlopen(data)

pagehtml= uclient.read()

uclient.close()

articles = bs(pagehtml,'html.parser')



#separate data by shop items

containers = articles.find_all('div',{'class' : 'shelfItem'})

However, when I try to grab the price, nothing is found:

containers[0].find_all('div',{'class':'price'})

...while inspecting the website with my browser shows the following:

<div class="price" id="retailpricediv_663653_0" style="height: 85px;">Price: <strong>$8.99</strong><br>

How can I grab that $8.99?

Thanks

asked Nov 19 '18 at 20:45

Berbatov

1321111

I'm trying to set up a web scraper for the following page:
https://www.autozone.com/external-engine/oil-filter?pageNumber=1

#connect and download html

data = 'https://www.autozone.com/motor-oil-and-transmission-fluid/engine-oil?pageNumber=1'

uclient = urlopen(data)

pagehtml= uclient.read()

uclient.close()

articles = bs(pagehtml,'html.parser')



#separate data by shop items

containers = articles.find_all('div',{'class' : 'shelfItem'})

However, when I try to grab the price, nothing is found:

containers[0].find_all('div',{'class':'price'})

...while inspecting the website with my browser shows the following:

<div class="price" id="retailpricediv_663653_0" style="height: 85px;">Price: <strong>$8.99</strong><br>

How can I grab that $8.99?

Thanks

python web-scraping beautifulsoup jupyter-notebook

asked Nov 19 '18 at 20:45

Berbatov

1321111

asked Nov 19 '18 at 20:45

Berbatov

1321111

asked Nov 19 '18 at 20:45

Berbatov

1321111

asked Nov 19 '18 at 20:45

Berbatov

1321111

asked Nov 19 '18 at 20:45

Berbatov

1321111

From having a quick look at the source, and fiddling in the console, although there are divs with a class of price, none are inside a div with class shelfItem (although the latter do exist). It appears that you want shelfItemDetails instead of simply shelfItem
– Robin Zigmond
Nov 19 '18 at 20:57

Thanks for the response. I tried both, but in either case, I just get an empty div (e.g. id="retailpricediv_663650_2" for the first article), with no price inside
– Berbatov
Nov 19 '18 at 21:04

add a comment |

From having a quick look at the source, and fiddling in the console, although there are divs with a class of price, none are inside a div with class shelfItem (although the latter do exist). It appears that you want shelfItemDetails instead of simply shelfItem
– Robin Zigmond
Nov 19 '18 at 20:57

Thanks for the response. I tried both, but in either case, I just get an empty div (e.g. id="retailpricediv_663650_2" for the first article), with no price inside
– Berbatov
Nov 19 '18 at 21:04

From having a quick look at the source, and fiddling in the console, although there are divs with a class of price, none are inside a div with class shelfItem (although the latter do exist). It appears that you want shelfItemDetails instead of simply shelfItem
– Robin Zigmond
Nov 19 '18 at 20:57

Thanks for the response. I tried both, but in either case, I just get an empty div (e.g. id="retailpricediv_663650_2" for the first article), with no price inside
– Berbatov
Nov 19 '18 at 21:04

add a comment |

3 Answers
3

active

oldest

votes

I think the prices are loaded by javascript so will need a method like selenium to ensure values present (or API call as shown in other answer!)

from selenium import webdriver

import pandas as pd



driver = webdriver.Chrome()

driver.get("https://www.autozone.com/motor-oil-and-transmission-fluid/engine-oil?pageNumber=1")

products = driver.find_elements_by_css_selector('.prodName')

prices = driver.find_elements_by_css_selector('.price[id*=retailpricediv]')



productList = 

priceList = 

for product, price in zip(products,prices):

    productList.append(product.text)

    priceList.append(price.text.split('n')[0].replace('Price: ',''))



df = pd.DataFrame({'Product':productList,'Price':priceList})

print(df)



driver.quit()

edited Nov 19 '18 at 21:58

answered Nov 19 '18 at 21:29

QHarr

30.9k81941

add a comment |

You can get required data prices by direct call to api:

import requests



url = 'https://www.autozone.com/rest/bean/autozone/diy/commerce/pricing/PricingServices/retrievePriceAndAvailability?atg-rest-depth=2'

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0'}

data = {'arg1': 6997, 'arg2':'', 'arg3': '663653,663636,663650,5531,663637,663639,644036,663658,663641,835241,663645,663642', 'arg4': ''}

response = requests.post(url, headers=headers, data=data).json()



for item in response['atgResponse']:

    print(item['retailPrice'])

Output:

To create data dict you need to pass store number as arg1 and list of each item id as arg3...

You can get arg1 value once, but arg3 should be extracted on each page

page_url = 'https://www.autozone.com/external-engine/oil-filter?pageNumber=1'

r = requests.get(page_url, headers=headers)

source = bs(r.text)

arg1 = source.find('div',{'id' : 'myStoreNum'}).text

arg3 = ",".join([_id['id'].strip('azid') for _id in source.find_all('div',{'class' : 'categorizedShelfItem'})])

so now you can define data without hardcoding values:

data = {'arg1': arg1, 'arg2':'', 'arg3': arg3, 'arg4': ''}

To get values from next page just change pageNumber=1 to pageNumber=2 in page_url - the rest code remains the same...

edited Nov 19 '18 at 22:27

answered Nov 19 '18 at 21:27

Andersson

37.6k103266

add a comment |

You can peel the same apple in different ways. Here is another approach using selenium:

from selenium import webdriver

from contextlib import closing



with closing(webdriver.Chrome()) as driver:

    driver.get("https://www.autozone.com/external-engine/oil-filter?pageNumber=1")

    for items in driver.find_elements_by_css_selector("[typeof='Product']"):

        price = items.find_element_by_css_selector('.price > strong').text

        print(price)

Output:

$8.99

$8.99

$10.99

$8.99

$8.99

and so on ....

answered Nov 20 '18 at 6:05

SIM

10.2k3743

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53382377%2fbeautifulsoup-does-not-pick-up-individual-tags%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

I think the prices are loaded by javascript so will need a method like selenium to ensure values present (or API call as shown in other answer!)

from selenium import webdriver

import pandas as pd



driver = webdriver.Chrome()

driver.get("https://www.autozone.com/motor-oil-and-transmission-fluid/engine-oil?pageNumber=1")

products = driver.find_elements_by_css_selector('.prodName')

prices = driver.find_elements_by_css_selector('.price[id*=retailpricediv]')



productList = 

priceList = 

for product, price in zip(products,prices):

    productList.append(product.text)

    priceList.append(price.text.split('n')[0].replace('Price: ',''))



df = pd.DataFrame({'Product':productList,'Price':priceList})

print(df)



driver.quit()

edited Nov 19 '18 at 21:58

answered Nov 19 '18 at 21:29

QHarr

30.9k81941

add a comment |

I think the prices are loaded by javascript so will need a method like selenium to ensure values present (or API call as shown in other answer!)

from selenium import webdriver

import pandas as pd



driver = webdriver.Chrome()

driver.get("https://www.autozone.com/motor-oil-and-transmission-fluid/engine-oil?pageNumber=1")

products = driver.find_elements_by_css_selector('.prodName')

prices = driver.find_elements_by_css_selector('.price[id*=retailpricediv]')



productList = 

priceList = 

for product, price in zip(products,prices):

    productList.append(product.text)

    priceList.append(price.text.split('n')[0].replace('Price: ',''))



df = pd.DataFrame({'Product':productList,'Price':priceList})

print(df)



driver.quit()

edited Nov 19 '18 at 21:58

answered Nov 19 '18 at 21:29

QHarr

30.9k81941

add a comment |

I think the prices are loaded by javascript so will need a method like selenium to ensure values present (or API call as shown in other answer!)

from selenium import webdriver

import pandas as pd



driver = webdriver.Chrome()

driver.get("https://www.autozone.com/motor-oil-and-transmission-fluid/engine-oil?pageNumber=1")

products = driver.find_elements_by_css_selector('.prodName')

prices = driver.find_elements_by_css_selector('.price[id*=retailpricediv]')



productList = 

priceList = 

for product, price in zip(products,prices):

    productList.append(product.text)

    priceList.append(price.text.split('n')[0].replace('Price: ',''))



df = pd.DataFrame({'Product':productList,'Price':priceList})

print(df)



driver.quit()

edited Nov 19 '18 at 21:58

answered Nov 19 '18 at 21:29

QHarr

30.9k81941

I think the prices are loaded by javascript so will need a method like selenium to ensure values present (or API call as shown in other answer!)

from selenium import webdriver

import pandas as pd



driver = webdriver.Chrome()

driver.get("https://www.autozone.com/motor-oil-and-transmission-fluid/engine-oil?pageNumber=1")

products = driver.find_elements_by_css_selector('.prodName')

prices = driver.find_elements_by_css_selector('.price[id*=retailpricediv]')



productList = 

priceList = 

for product, price in zip(products,prices):

    productList.append(product.text)

    priceList.append(price.text.split('n')[0].replace('Price: ',''))



df = pd.DataFrame({'Product':productList,'Price':priceList})

print(df)



driver.quit()

edited Nov 19 '18 at 21:58

answered Nov 19 '18 at 21:29

QHarr

30.9k81941

edited Nov 19 '18 at 21:58

answered Nov 19 '18 at 21:29

QHarr

30.9k81941

answered Nov 19 '18 at 21:29

QHarr

30.9k81941

answered Nov 19 '18 at 21:29

QHarr

30.9k81941

add a comment |

You can get required data prices by direct call to api:

import requests



url = 'https://www.autozone.com/rest/bean/autozone/diy/commerce/pricing/PricingServices/retrievePriceAndAvailability?atg-rest-depth=2'

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0'}

data = {'arg1': 6997, 'arg2':'', 'arg3': '663653,663636,663650,5531,663637,663639,644036,663658,663641,835241,663645,663642', 'arg4': ''}

response = requests.post(url, headers=headers, data=data).json()



for item in response['atgResponse']:

    print(item['retailPrice'])

Output:

To create data dict you need to pass store number as arg1 and list of each item id as arg3...

You can get arg1 value once, but arg3 should be extracted on each page

page_url = 'https://www.autozone.com/external-engine/oil-filter?pageNumber=1'

r = requests.get(page_url, headers=headers)

source = bs(r.text)

arg1 = source.find('div',{'id' : 'myStoreNum'}).text

arg3 = ",".join([_id['id'].strip('azid') for _id in source.find_all('div',{'class' : 'categorizedShelfItem'})])

so now you can define data without hardcoding values:

data = {'arg1': arg1, 'arg2':'', 'arg3': arg3, 'arg4': ''}

To get values from next page just change pageNumber=1 to pageNumber=2 in page_url - the rest code remains the same...

edited Nov 19 '18 at 22:27

answered Nov 19 '18 at 21:27

Andersson

37.6k103266

add a comment |

You can get required data prices by direct call to api:

import requests



url = 'https://www.autozone.com/rest/bean/autozone/diy/commerce/pricing/PricingServices/retrievePriceAndAvailability?atg-rest-depth=2'

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0'}

data = {'arg1': 6997, 'arg2':'', 'arg3': '663653,663636,663650,5531,663637,663639,644036,663658,663641,835241,663645,663642', 'arg4': ''}

response = requests.post(url, headers=headers, data=data).json()



for item in response['atgResponse']:

    print(item['retailPrice'])

Output:

To create data dict you need to pass store number as arg1 and list of each item id as arg3...

You can get arg1 value once, but arg3 should be extracted on each page

page_url = 'https://www.autozone.com/external-engine/oil-filter?pageNumber=1'

r = requests.get(page_url, headers=headers)

source = bs(r.text)

arg1 = source.find('div',{'id' : 'myStoreNum'}).text

arg3 = ",".join([_id['id'].strip('azid') for _id in source.find_all('div',{'class' : 'categorizedShelfItem'})])

so now you can define data without hardcoding values:

data = {'arg1': arg1, 'arg2':'', 'arg3': arg3, 'arg4': ''}

To get values from next page just change pageNumber=1 to pageNumber=2 in page_url - the rest code remains the same...

edited Nov 19 '18 at 22:27

answered Nov 19 '18 at 21:27

Andersson

37.6k103266

add a comment |

You can get required data prices by direct call to api:

import requests



url = 'https://www.autozone.com/rest/bean/autozone/diy/commerce/pricing/PricingServices/retrievePriceAndAvailability?atg-rest-depth=2'

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0'}

data = {'arg1': 6997, 'arg2':'', 'arg3': '663653,663636,663650,5531,663637,663639,644036,663658,663641,835241,663645,663642', 'arg4': ''}

response = requests.post(url, headers=headers, data=data).json()



for item in response['atgResponse']:

    print(item['retailPrice'])

Output:

To create data dict you need to pass store number as arg1 and list of each item id as arg3...

You can get arg1 value once, but arg3 should be extracted on each page

page_url = 'https://www.autozone.com/external-engine/oil-filter?pageNumber=1'

r = requests.get(page_url, headers=headers)

source = bs(r.text)

arg1 = source.find('div',{'id' : 'myStoreNum'}).text

arg3 = ",".join([_id['id'].strip('azid') for _id in source.find_all('div',{'class' : 'categorizedShelfItem'})])

so now you can define data without hardcoding values:

data = {'arg1': arg1, 'arg2':'', 'arg3': arg3, 'arg4': ''}

To get values from next page just change pageNumber=1 to pageNumber=2 in page_url - the rest code remains the same...

edited Nov 19 '18 at 22:27

answered Nov 19 '18 at 21:27

Andersson

37.6k103266

You can get required data prices by direct call to api:

import requests



url = 'https://www.autozone.com/rest/bean/autozone/diy/commerce/pricing/PricingServices/retrievePriceAndAvailability?atg-rest-depth=2'

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0'}

data = {'arg1': 6997, 'arg2':'', 'arg3': '663653,663636,663650,5531,663637,663639,644036,663658,663641,835241,663645,663642', 'arg4': ''}

response = requests.post(url, headers=headers, data=data).json()



for item in response['atgResponse']:

    print(item['retailPrice'])

Output:

To create data dict you need to pass store number as arg1 and list of each item id as arg3...

You can get arg1 value once, but arg3 should be extracted on each page

page_url = 'https://www.autozone.com/external-engine/oil-filter?pageNumber=1'

r = requests.get(page_url, headers=headers)

source = bs(r.text)

arg1 = source.find('div',{'id' : 'myStoreNum'}).text

arg3 = ",".join([_id['id'].strip('azid') for _id in source.find_all('div',{'class' : 'categorizedShelfItem'})])

so now you can define data without hardcoding values:

data = {'arg1': arg1, 'arg2':'', 'arg3': arg3, 'arg4': ''}

To get values from next page just change pageNumber=1 to pageNumber=2 in page_url - the rest code remains the same...

edited Nov 19 '18 at 22:27

answered Nov 19 '18 at 21:27

Andersson

37.6k103266

edited Nov 19 '18 at 22:27

answered Nov 19 '18 at 21:27

Andersson

37.6k103266

answered Nov 19 '18 at 21:27

Andersson

37.6k103266

answered Nov 19 '18 at 21:27

Andersson

37.6k103266

add a comment |

You can peel the same apple in different ways. Here is another approach using selenium:

from selenium import webdriver

from contextlib import closing



with closing(webdriver.Chrome()) as driver:

    driver.get("https://www.autozone.com/external-engine/oil-filter?pageNumber=1")

    for items in driver.find_elements_by_css_selector("[typeof='Product']"):

        price = items.find_element_by_css_selector('.price > strong').text

        print(price)

Output:

$8.99

$8.99

$10.99

$8.99

$8.99

and so on ....

answered Nov 20 '18 at 6:05

SIM

10.2k3743

add a comment |

You can peel the same apple in different ways. Here is another approach using selenium:

from selenium import webdriver

from contextlib import closing



with closing(webdriver.Chrome()) as driver:

    driver.get("https://www.autozone.com/external-engine/oil-filter?pageNumber=1")

    for items in driver.find_elements_by_css_selector("[typeof='Product']"):

        price = items.find_element_by_css_selector('.price > strong').text

        print(price)

Output:

$8.99

$8.99

$10.99

$8.99

$8.99

and so on ....

answered Nov 20 '18 at 6:05

SIM

10.2k3743

add a comment |

You can peel the same apple in different ways. Here is another approach using selenium:

from selenium import webdriver

from contextlib import closing



with closing(webdriver.Chrome()) as driver:

    driver.get("https://www.autozone.com/external-engine/oil-filter?pageNumber=1")

    for items in driver.find_elements_by_css_selector("[typeof='Product']"):

        price = items.find_element_by_css_selector('.price > strong').text

        print(price)

Output:

$8.99

$8.99

$10.99

$8.99

$8.99

and so on ....

answered Nov 20 '18 at 6:05

SIM

10.2k3743

You can peel the same apple in different ways. Here is another approach using selenium:

from selenium import webdriver

from contextlib import closing



with closing(webdriver.Chrome()) as driver:

    driver.get("https://www.autozone.com/external-engine/oil-filter?pageNumber=1")

    for items in driver.find_elements_by_css_selector("[typeof='Product']"):

        price = items.find_element_by_css_selector('.price > strong').text

        print(price)

Output:

$8.99

$8.99

$10.99

$8.99

$8.99

and so on ....

answered Nov 20 '18 at 6:05

SIM

10.2k3743

answered Nov 20 '18 at 6:05

SIM

10.2k3743

answered Nov 20 '18 at 6:05

SIM

10.2k3743

answered Nov 20 '18 at 6:05

SIM

10.2k3743

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

Search This Blog

Ufyukyu