BeautifulSoup does not pick up individual tags












1














I'm trying to set up a web scraper for the following page:
https://www.autozone.com/external-engine/oil-filter?pageNumber=1



#connect and download html
data = 'https://www.autozone.com/motor-oil-and-transmission-fluid/engine-oil?pageNumber=1'
uclient = urlopen(data)
pagehtml= uclient.read()
uclient.close()
articles = bs(pagehtml,'html.parser')

#separate data by shop items
containers = articles.find_all('div',{'class' : 'shelfItem'})


However, when I try to grab the price, nothing is found:



containers[0].find_all('div',{'class':'price'})


...while inspecting the website with my browser shows the following:



<div class="price" id="retailpricediv_663653_0" style="height: 85px;">Price: <strong>$8.99</strong><br>


How can I grab that $8.99?



Thanks










share|improve this question






















  • From having a quick look at the source, and fiddling in the console, although there are divs with a class of price, none are inside a div with class shelfItem (although the latter do exist). It appears that you want shelfItemDetails instead of simply shelfItem
    – Robin Zigmond
    Nov 19 '18 at 20:57










  • Thanks for the response. I tried both, but in either case, I just get an empty div (e.g. id="retailpricediv_663650_2" for the first article), with no price inside
    – Berbatov
    Nov 19 '18 at 21:04
















1














I'm trying to set up a web scraper for the following page:
https://www.autozone.com/external-engine/oil-filter?pageNumber=1



#connect and download html
data = 'https://www.autozone.com/motor-oil-and-transmission-fluid/engine-oil?pageNumber=1'
uclient = urlopen(data)
pagehtml= uclient.read()
uclient.close()
articles = bs(pagehtml,'html.parser')

#separate data by shop items
containers = articles.find_all('div',{'class' : 'shelfItem'})


However, when I try to grab the price, nothing is found:



containers[0].find_all('div',{'class':'price'})


...while inspecting the website with my browser shows the following:



<div class="price" id="retailpricediv_663653_0" style="height: 85px;">Price: <strong>$8.99</strong><br>


How can I grab that $8.99?



Thanks










share|improve this question






















  • From having a quick look at the source, and fiddling in the console, although there are divs with a class of price, none are inside a div with class shelfItem (although the latter do exist). It appears that you want shelfItemDetails instead of simply shelfItem
    – Robin Zigmond
    Nov 19 '18 at 20:57










  • Thanks for the response. I tried both, but in either case, I just get an empty div (e.g. id="retailpricediv_663650_2" for the first article), with no price inside
    – Berbatov
    Nov 19 '18 at 21:04














1












1








1







I'm trying to set up a web scraper for the following page:
https://www.autozone.com/external-engine/oil-filter?pageNumber=1



#connect and download html
data = 'https://www.autozone.com/motor-oil-and-transmission-fluid/engine-oil?pageNumber=1'
uclient = urlopen(data)
pagehtml= uclient.read()
uclient.close()
articles = bs(pagehtml,'html.parser')

#separate data by shop items
containers = articles.find_all('div',{'class' : 'shelfItem'})


However, when I try to grab the price, nothing is found:



containers[0].find_all('div',{'class':'price'})


...while inspecting the website with my browser shows the following:



<div class="price" id="retailpricediv_663653_0" style="height: 85px;">Price: <strong>$8.99</strong><br>


How can I grab that $8.99?



Thanks










share|improve this question













I'm trying to set up a web scraper for the following page:
https://www.autozone.com/external-engine/oil-filter?pageNumber=1



#connect and download html
data = 'https://www.autozone.com/motor-oil-and-transmission-fluid/engine-oil?pageNumber=1'
uclient = urlopen(data)
pagehtml= uclient.read()
uclient.close()
articles = bs(pagehtml,'html.parser')

#separate data by shop items
containers = articles.find_all('div',{'class' : 'shelfItem'})


However, when I try to grab the price, nothing is found:



containers[0].find_all('div',{'class':'price'})


...while inspecting the website with my browser shows the following:



<div class="price" id="retailpricediv_663653_0" style="height: 85px;">Price: <strong>$8.99</strong><br>


How can I grab that $8.99?



Thanks







python web-scraping beautifulsoup jupyter-notebook






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 19 '18 at 20:45









BerbatovBerbatov

1321111




1321111












  • From having a quick look at the source, and fiddling in the console, although there are divs with a class of price, none are inside a div with class shelfItem (although the latter do exist). It appears that you want shelfItemDetails instead of simply shelfItem
    – Robin Zigmond
    Nov 19 '18 at 20:57










  • Thanks for the response. I tried both, but in either case, I just get an empty div (e.g. id="retailpricediv_663650_2" for the first article), with no price inside
    – Berbatov
    Nov 19 '18 at 21:04


















  • From having a quick look at the source, and fiddling in the console, although there are divs with a class of price, none are inside a div with class shelfItem (although the latter do exist). It appears that you want shelfItemDetails instead of simply shelfItem
    – Robin Zigmond
    Nov 19 '18 at 20:57










  • Thanks for the response. I tried both, but in either case, I just get an empty div (e.g. id="retailpricediv_663650_2" for the first article), with no price inside
    – Berbatov
    Nov 19 '18 at 21:04
















From having a quick look at the source, and fiddling in the console, although there are divs with a class of price, none are inside a div with class shelfItem (although the latter do exist). It appears that you want shelfItemDetails instead of simply shelfItem
– Robin Zigmond
Nov 19 '18 at 20:57




From having a quick look at the source, and fiddling in the console, although there are divs with a class of price, none are inside a div with class shelfItem (although the latter do exist). It appears that you want shelfItemDetails instead of simply shelfItem
– Robin Zigmond
Nov 19 '18 at 20:57












Thanks for the response. I tried both, but in either case, I just get an empty div (e.g. id="retailpricediv_663650_2" for the first article), with no price inside
– Berbatov
Nov 19 '18 at 21:04




Thanks for the response. I tried both, but in either case, I just get an empty div (e.g. id="retailpricediv_663650_2" for the first article), with no price inside
– Berbatov
Nov 19 '18 at 21:04












3 Answers
3






active

oldest

votes


















2














I think the prices are loaded by javascript so will need a method like selenium to ensure values present (or API call as shown in other answer!)



from selenium import webdriver
import pandas as pd

driver = webdriver.Chrome()
driver.get("https://www.autozone.com/motor-oil-and-transmission-fluid/engine-oil?pageNumber=1")
products = driver.find_elements_by_css_selector('.prodName')
prices = driver.find_elements_by_css_selector('.price[id*=retailpricediv]')

productList =
priceList =
for product, price in zip(products,prices):
productList.append(product.text)
priceList.append(price.text.split('n')[0].replace('Price: ',''))

df = pd.DataFrame({'Product':productList,'Price':priceList})
print(df)

driver.quit()





share|improve this answer































    2














    You can get required data prices by direct call to api:



    import requests

    url = 'https://www.autozone.com/rest/bean/autozone/diy/commerce/pricing/PricingServices/retrievePriceAndAvailability?atg-rest-depth=2'
    headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0'}
    data = {'arg1': 6997, 'arg2':'', 'arg3': '663653,663636,663650,5531,663637,663639,644036,663658,663641,835241,663645,663642', 'arg4': ''}
    response = requests.post(url, headers=headers, data=data).json()

    for item in response['atgResponse']:
    print(item['retailPrice'])


    Output:



    8.99
    8.99
    10.99
    8.99
    8.99
    8.99
    8.99
    8.99
    8.99
    8.99
    8.99
    8.99


    To create data dict you need to pass store number as arg1 and list of each item id as arg3...



    You can get arg1 value once, but arg3 should be extracted on each page



    page_url = 'https://www.autozone.com/external-engine/oil-filter?pageNumber=1'
    r = requests.get(page_url, headers=headers)
    source = bs(r.text)
    arg1 = source.find('div',{'id' : 'myStoreNum'}).text
    arg3 = ",".join([_id['id'].strip('azid') for _id in source.find_all('div',{'class' : 'categorizedShelfItem'})])


    so now you can define data without hardcoding values:



    data = {'arg1': arg1, 'arg2':'', 'arg3': arg3, 'arg4': ''}


    To get values from next page just change pageNumber=1 to pageNumber=2 in page_url - the rest code remains the same...






    share|improve this answer































      1














      You can peel the same apple in different ways. Here is another approach using selenium:



      from selenium import webdriver
      from contextlib import closing

      with closing(webdriver.Chrome()) as driver:
      driver.get("https://www.autozone.com/external-engine/oil-filter?pageNumber=1")
      for items in driver.find_elements_by_css_selector("[typeof='Product']"):
      price = items.find_element_by_css_selector('.price > strong').text
      print(price)


      Output:



      $8.99
      $8.99
      $10.99
      $8.99
      $8.99


      and so on ....






      share|improve this answer





















        Your Answer






        StackExchange.ifUsing("editor", function () {
        StackExchange.using("externalEditor", function () {
        StackExchange.using("snippets", function () {
        StackExchange.snippets.init();
        });
        });
        }, "code-snippets");

        StackExchange.ready(function() {
        var channelOptions = {
        tags: "".split(" "),
        id: "1"
        };
        initTagRenderer("".split(" "), "".split(" "), channelOptions);

        StackExchange.using("externalEditor", function() {
        // Have to fire editor after snippets, if snippets enabled
        if (StackExchange.settings.snippets.snippetsEnabled) {
        StackExchange.using("snippets", function() {
        createEditor();
        });
        }
        else {
        createEditor();
        }
        });

        function createEditor() {
        StackExchange.prepareEditor({
        heartbeatType: 'answer',
        autoActivateHeartbeat: false,
        convertImagesToLinks: true,
        noModals: true,
        showLowRepImageUploadWarning: true,
        reputationToPostImages: 10,
        bindNavPrevention: true,
        postfix: "",
        imageUploader: {
        brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
        contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
        allowUrls: true
        },
        onDemand: true,
        discardSelector: ".discard-answer"
        ,immediatelyShowMarkdownHelp:true
        });


        }
        });














        draft saved

        draft discarded


















        StackExchange.ready(
        function () {
        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53382377%2fbeautifulsoup-does-not-pick-up-individual-tags%23new-answer', 'question_page');
        }
        );

        Post as a guest















        Required, but never shown

























        3 Answers
        3






        active

        oldest

        votes








        3 Answers
        3






        active

        oldest

        votes









        active

        oldest

        votes






        active

        oldest

        votes









        2














        I think the prices are loaded by javascript so will need a method like selenium to ensure values present (or API call as shown in other answer!)



        from selenium import webdriver
        import pandas as pd

        driver = webdriver.Chrome()
        driver.get("https://www.autozone.com/motor-oil-and-transmission-fluid/engine-oil?pageNumber=1")
        products = driver.find_elements_by_css_selector('.prodName')
        prices = driver.find_elements_by_css_selector('.price[id*=retailpricediv]')

        productList =
        priceList =
        for product, price in zip(products,prices):
        productList.append(product.text)
        priceList.append(price.text.split('n')[0].replace('Price: ',''))

        df = pd.DataFrame({'Product':productList,'Price':priceList})
        print(df)

        driver.quit()





        share|improve this answer




























          2














          I think the prices are loaded by javascript so will need a method like selenium to ensure values present (or API call as shown in other answer!)



          from selenium import webdriver
          import pandas as pd

          driver = webdriver.Chrome()
          driver.get("https://www.autozone.com/motor-oil-and-transmission-fluid/engine-oil?pageNumber=1")
          products = driver.find_elements_by_css_selector('.prodName')
          prices = driver.find_elements_by_css_selector('.price[id*=retailpricediv]')

          productList =
          priceList =
          for product, price in zip(products,prices):
          productList.append(product.text)
          priceList.append(price.text.split('n')[0].replace('Price: ',''))

          df = pd.DataFrame({'Product':productList,'Price':priceList})
          print(df)

          driver.quit()





          share|improve this answer


























            2












            2








            2






            I think the prices are loaded by javascript so will need a method like selenium to ensure values present (or API call as shown in other answer!)



            from selenium import webdriver
            import pandas as pd

            driver = webdriver.Chrome()
            driver.get("https://www.autozone.com/motor-oil-and-transmission-fluid/engine-oil?pageNumber=1")
            products = driver.find_elements_by_css_selector('.prodName')
            prices = driver.find_elements_by_css_selector('.price[id*=retailpricediv]')

            productList =
            priceList =
            for product, price in zip(products,prices):
            productList.append(product.text)
            priceList.append(price.text.split('n')[0].replace('Price: ',''))

            df = pd.DataFrame({'Product':productList,'Price':priceList})
            print(df)

            driver.quit()





            share|improve this answer














            I think the prices are loaded by javascript so will need a method like selenium to ensure values present (or API call as shown in other answer!)



            from selenium import webdriver
            import pandas as pd

            driver = webdriver.Chrome()
            driver.get("https://www.autozone.com/motor-oil-and-transmission-fluid/engine-oil?pageNumber=1")
            products = driver.find_elements_by_css_selector('.prodName')
            prices = driver.find_elements_by_css_selector('.price[id*=retailpricediv]')

            productList =
            priceList =
            for product, price in zip(products,prices):
            productList.append(product.text)
            priceList.append(price.text.split('n')[0].replace('Price: ',''))

            df = pd.DataFrame({'Product':productList,'Price':priceList})
            print(df)

            driver.quit()






            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Nov 19 '18 at 21:58

























            answered Nov 19 '18 at 21:29









            QHarrQHarr

            30.9k81941




            30.9k81941

























                2














                You can get required data prices by direct call to api:



                import requests

                url = 'https://www.autozone.com/rest/bean/autozone/diy/commerce/pricing/PricingServices/retrievePriceAndAvailability?atg-rest-depth=2'
                headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0'}
                data = {'arg1': 6997, 'arg2':'', 'arg3': '663653,663636,663650,5531,663637,663639,644036,663658,663641,835241,663645,663642', 'arg4': ''}
                response = requests.post(url, headers=headers, data=data).json()

                for item in response['atgResponse']:
                print(item['retailPrice'])


                Output:



                8.99
                8.99
                10.99
                8.99
                8.99
                8.99
                8.99
                8.99
                8.99
                8.99
                8.99
                8.99


                To create data dict you need to pass store number as arg1 and list of each item id as arg3...



                You can get arg1 value once, but arg3 should be extracted on each page



                page_url = 'https://www.autozone.com/external-engine/oil-filter?pageNumber=1'
                r = requests.get(page_url, headers=headers)
                source = bs(r.text)
                arg1 = source.find('div',{'id' : 'myStoreNum'}).text
                arg3 = ",".join([_id['id'].strip('azid') for _id in source.find_all('div',{'class' : 'categorizedShelfItem'})])


                so now you can define data without hardcoding values:



                data = {'arg1': arg1, 'arg2':'', 'arg3': arg3, 'arg4': ''}


                To get values from next page just change pageNumber=1 to pageNumber=2 in page_url - the rest code remains the same...






                share|improve this answer




























                  2














                  You can get required data prices by direct call to api:



                  import requests

                  url = 'https://www.autozone.com/rest/bean/autozone/diy/commerce/pricing/PricingServices/retrievePriceAndAvailability?atg-rest-depth=2'
                  headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0'}
                  data = {'arg1': 6997, 'arg2':'', 'arg3': '663653,663636,663650,5531,663637,663639,644036,663658,663641,835241,663645,663642', 'arg4': ''}
                  response = requests.post(url, headers=headers, data=data).json()

                  for item in response['atgResponse']:
                  print(item['retailPrice'])


                  Output:



                  8.99
                  8.99
                  10.99
                  8.99
                  8.99
                  8.99
                  8.99
                  8.99
                  8.99
                  8.99
                  8.99
                  8.99


                  To create data dict you need to pass store number as arg1 and list of each item id as arg3...



                  You can get arg1 value once, but arg3 should be extracted on each page



                  page_url = 'https://www.autozone.com/external-engine/oil-filter?pageNumber=1'
                  r = requests.get(page_url, headers=headers)
                  source = bs(r.text)
                  arg1 = source.find('div',{'id' : 'myStoreNum'}).text
                  arg3 = ",".join([_id['id'].strip('azid') for _id in source.find_all('div',{'class' : 'categorizedShelfItem'})])


                  so now you can define data without hardcoding values:



                  data = {'arg1': arg1, 'arg2':'', 'arg3': arg3, 'arg4': ''}


                  To get values from next page just change pageNumber=1 to pageNumber=2 in page_url - the rest code remains the same...






                  share|improve this answer


























                    2












                    2








                    2






                    You can get required data prices by direct call to api:



                    import requests

                    url = 'https://www.autozone.com/rest/bean/autozone/diy/commerce/pricing/PricingServices/retrievePriceAndAvailability?atg-rest-depth=2'
                    headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0'}
                    data = {'arg1': 6997, 'arg2':'', 'arg3': '663653,663636,663650,5531,663637,663639,644036,663658,663641,835241,663645,663642', 'arg4': ''}
                    response = requests.post(url, headers=headers, data=data).json()

                    for item in response['atgResponse']:
                    print(item['retailPrice'])


                    Output:



                    8.99
                    8.99
                    10.99
                    8.99
                    8.99
                    8.99
                    8.99
                    8.99
                    8.99
                    8.99
                    8.99
                    8.99


                    To create data dict you need to pass store number as arg1 and list of each item id as arg3...



                    You can get arg1 value once, but arg3 should be extracted on each page



                    page_url = 'https://www.autozone.com/external-engine/oil-filter?pageNumber=1'
                    r = requests.get(page_url, headers=headers)
                    source = bs(r.text)
                    arg1 = source.find('div',{'id' : 'myStoreNum'}).text
                    arg3 = ",".join([_id['id'].strip('azid') for _id in source.find_all('div',{'class' : 'categorizedShelfItem'})])


                    so now you can define data without hardcoding values:



                    data = {'arg1': arg1, 'arg2':'', 'arg3': arg3, 'arg4': ''}


                    To get values from next page just change pageNumber=1 to pageNumber=2 in page_url - the rest code remains the same...






                    share|improve this answer














                    You can get required data prices by direct call to api:



                    import requests

                    url = 'https://www.autozone.com/rest/bean/autozone/diy/commerce/pricing/PricingServices/retrievePriceAndAvailability?atg-rest-depth=2'
                    headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0'}
                    data = {'arg1': 6997, 'arg2':'', 'arg3': '663653,663636,663650,5531,663637,663639,644036,663658,663641,835241,663645,663642', 'arg4': ''}
                    response = requests.post(url, headers=headers, data=data).json()

                    for item in response['atgResponse']:
                    print(item['retailPrice'])


                    Output:



                    8.99
                    8.99
                    10.99
                    8.99
                    8.99
                    8.99
                    8.99
                    8.99
                    8.99
                    8.99
                    8.99
                    8.99


                    To create data dict you need to pass store number as arg1 and list of each item id as arg3...



                    You can get arg1 value once, but arg3 should be extracted on each page



                    page_url = 'https://www.autozone.com/external-engine/oil-filter?pageNumber=1'
                    r = requests.get(page_url, headers=headers)
                    source = bs(r.text)
                    arg1 = source.find('div',{'id' : 'myStoreNum'}).text
                    arg3 = ",".join([_id['id'].strip('azid') for _id in source.find_all('div',{'class' : 'categorizedShelfItem'})])


                    so now you can define data without hardcoding values:



                    data = {'arg1': arg1, 'arg2':'', 'arg3': arg3, 'arg4': ''}


                    To get values from next page just change pageNumber=1 to pageNumber=2 in page_url - the rest code remains the same...







                    share|improve this answer














                    share|improve this answer



                    share|improve this answer








                    edited Nov 19 '18 at 22:27

























                    answered Nov 19 '18 at 21:27









                    AnderssonAndersson

                    37.6k103266




                    37.6k103266























                        1














                        You can peel the same apple in different ways. Here is another approach using selenium:



                        from selenium import webdriver
                        from contextlib import closing

                        with closing(webdriver.Chrome()) as driver:
                        driver.get("https://www.autozone.com/external-engine/oil-filter?pageNumber=1")
                        for items in driver.find_elements_by_css_selector("[typeof='Product']"):
                        price = items.find_element_by_css_selector('.price > strong').text
                        print(price)


                        Output:



                        $8.99
                        $8.99
                        $10.99
                        $8.99
                        $8.99


                        and so on ....






                        share|improve this answer


























                          1














                          You can peel the same apple in different ways. Here is another approach using selenium:



                          from selenium import webdriver
                          from contextlib import closing

                          with closing(webdriver.Chrome()) as driver:
                          driver.get("https://www.autozone.com/external-engine/oil-filter?pageNumber=1")
                          for items in driver.find_elements_by_css_selector("[typeof='Product']"):
                          price = items.find_element_by_css_selector('.price > strong').text
                          print(price)


                          Output:



                          $8.99
                          $8.99
                          $10.99
                          $8.99
                          $8.99


                          and so on ....






                          share|improve this answer
























                            1












                            1








                            1






                            You can peel the same apple in different ways. Here is another approach using selenium:



                            from selenium import webdriver
                            from contextlib import closing

                            with closing(webdriver.Chrome()) as driver:
                            driver.get("https://www.autozone.com/external-engine/oil-filter?pageNumber=1")
                            for items in driver.find_elements_by_css_selector("[typeof='Product']"):
                            price = items.find_element_by_css_selector('.price > strong').text
                            print(price)


                            Output:



                            $8.99
                            $8.99
                            $10.99
                            $8.99
                            $8.99


                            and so on ....






                            share|improve this answer












                            You can peel the same apple in different ways. Here is another approach using selenium:



                            from selenium import webdriver
                            from contextlib import closing

                            with closing(webdriver.Chrome()) as driver:
                            driver.get("https://www.autozone.com/external-engine/oil-filter?pageNumber=1")
                            for items in driver.find_elements_by_css_selector("[typeof='Product']"):
                            price = items.find_element_by_css_selector('.price > strong').text
                            print(price)


                            Output:



                            $8.99
                            $8.99
                            $10.99
                            $8.99
                            $8.99


                            and so on ....







                            share|improve this answer












                            share|improve this answer



                            share|improve this answer










                            answered Nov 20 '18 at 6:05









                            SIMSIM

                            10.2k3743




                            10.2k3743






























                                draft saved

                                draft discarded




















































                                Thanks for contributing an answer to Stack Overflow!


                                • Please be sure to answer the question. Provide details and share your research!

                                But avoid



                                • Asking for help, clarification, or responding to other answers.

                                • Making statements based on opinion; back them up with references or personal experience.


                                To learn more, see our tips on writing great answers.




                                draft saved


                                draft discarded














                                StackExchange.ready(
                                function () {
                                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53382377%2fbeautifulsoup-does-not-pick-up-individual-tags%23new-answer', 'question_page');
                                }
                                );

                                Post as a guest















                                Required, but never shown





















































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown

































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown







                                Popular posts from this blog

                                MongoDB - Not Authorized To Execute Command

                                How to fix TextFormField cause rebuild widget in Flutter

                                in spring boot 2.1 many test slices are not allowed anymore due to multiple @BootstrapWith