Scrapy Debug crawled 200 and nothing return












0















I am working on a crawling project and try to get each endorsement link of a band.



My code is as follows:



my code



It returned nothing. However, if I put each URL of a band in the start_url, it works well. But it will be hard for me to put all the URLs I want manually in the start_url field since I am even not sure how many there are...



The log is shown:



log



Anyone can help? Thanks in advance!










share|improve this question

























  • Next time, it would be great if you could put your code directly in the question instead of in an image, that much more helpful, same for the logs

    – Guillaume
    Nov 21 '18 at 18:00
















0















I am working on a crawling project and try to get each endorsement link of a band.



My code is as follows:



my code



It returned nothing. However, if I put each URL of a band in the start_url, it works well. But it will be hard for me to put all the URLs I want manually in the start_url field since I am even not sure how many there are...



The log is shown:



log



Anyone can help? Thanks in advance!










share|improve this question

























  • Next time, it would be great if you could put your code directly in the question instead of in an image, that much more helpful, same for the logs

    – Guillaume
    Nov 21 '18 at 18:00














0












0








0


1






I am working on a crawling project and try to get each endorsement link of a band.



My code is as follows:



my code



It returned nothing. However, if I put each URL of a band in the start_url, it works well. But it will be hard for me to put all the URLs I want manually in the start_url field since I am even not sure how many there are...



The log is shown:



log



Anyone can help? Thanks in advance!










share|improve this question
















I am working on a crawling project and try to get each endorsement link of a band.



My code is as follows:



my code



It returned nothing. However, if I put each URL of a band in the start_url, it works well. But it will be hard for me to put all the URLs I want manually in the start_url field since I am even not sure how many there are...



The log is shown:



log



Anyone can help? Thanks in advance!







python web-scraping scrapy web-crawler scrapy-spider






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 21 '18 at 19:05









Guillaume

1,1381724




1,1381724










asked Nov 20 '18 at 19:20









EmilyEmily

61




61













  • Next time, it would be great if you could put your code directly in the question instead of in an image, that much more helpful, same for the logs

    – Guillaume
    Nov 21 '18 at 18:00



















  • Next time, it would be great if you could put your code directly in the question instead of in an image, that much more helpful, same for the logs

    – Guillaume
    Nov 21 '18 at 18:00

















Next time, it would be great if you could put your code directly in the question instead of in an image, that much more helpful, same for the logs

– Guillaume
Nov 21 '18 at 18:00





Next time, it would be great if you could put your code directly in the question instead of in an image, that much more helpful, same for the logs

– Guillaume
Nov 21 '18 at 18:00












1 Answer
1






active

oldest

votes


















0














Your restrict xpath expression looks wrong.



You could use the allow parameter instead, this is much easier:



from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor


class MySpider(CrawlSpider):

name = 'celebrityendorsers.com'
start_urls = ['https://celebrityendorsers.com/endorsement/']

rules = (
Rule(LinkExtractor('/endorsements/'), callback='parse_url_contents'),
)

def parse_url_contents(self, response):
pass


This is the output log:



2018-11-22 02:01:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://celebrityendorsers.com/endorsements/playtex-wipes/> (referer: https://celebrityendorsers.com/endorsement/)
2018-11-22 02:01:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://celebrityendorsers.com/endorsements/plenish-cleanse/> (referer: https://celebrityendorsers.com/endorsement/)
2018-11-22 02:01:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://celebrityendorsers.com/endorsements/play-date-by-sarah-beckham/> (referer: https://celebrityendorsers.com/endorsement/)
2018-11-22 02:01:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://celebrityendorsers.com/endorsements/playstation-3/> (referer: https://celebrityendorsers.com/endorsement/)
2018-11-22 02:01:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://celebrityendorsers.com/endorsements/playmg/> (referer: https://celebrityendorsers.com/endorsement/)
2018-11-22 02:01:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://celebrityendorsers.com/endorsements/playsight/> (referer: https://celebrityendorsers.com/endorsement/)
2018-11-22 02:01:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://celebrityendorsers.com/endorsements/play-cloths/> (referer: https://celebrityendorsers.com/endorsement/)
2018-11-22 02:01:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://celebrityendorsers.com/endorsements/platinum-league-trading-cards/> (referer: https://celebrityendorsers.com/endorsement/)
2018-11-22 02:01:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://celebrityendorsers.com/endorsements/playstation/> (referer: https://celebrityendorsers.com/endorsement/)
2018-11-22 02:01:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://celebrityendorsers.com/endorsements/platinum-group/> (referer: https://celebrityendorsers.com/endorsement/)


If you really want to use xpath, then try removing [*].



The xpath that you commented looks correct, but the callback is wrong, you cannot use the parse callback with a CrawlSpider.






share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53400090%2fscrapy-debug-crawled-200-and-nothing-return%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    Your restrict xpath expression looks wrong.



    You could use the allow parameter instead, this is much easier:



    from scrapy.spiders import CrawlSpider, Rule
    from scrapy.linkextractors import LinkExtractor


    class MySpider(CrawlSpider):

    name = 'celebrityendorsers.com'
    start_urls = ['https://celebrityendorsers.com/endorsement/']

    rules = (
    Rule(LinkExtractor('/endorsements/'), callback='parse_url_contents'),
    )

    def parse_url_contents(self, response):
    pass


    This is the output log:



    2018-11-22 02:01:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://celebrityendorsers.com/endorsements/playtex-wipes/> (referer: https://celebrityendorsers.com/endorsement/)
    2018-11-22 02:01:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://celebrityendorsers.com/endorsements/plenish-cleanse/> (referer: https://celebrityendorsers.com/endorsement/)
    2018-11-22 02:01:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://celebrityendorsers.com/endorsements/play-date-by-sarah-beckham/> (referer: https://celebrityendorsers.com/endorsement/)
    2018-11-22 02:01:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://celebrityendorsers.com/endorsements/playstation-3/> (referer: https://celebrityendorsers.com/endorsement/)
    2018-11-22 02:01:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://celebrityendorsers.com/endorsements/playmg/> (referer: https://celebrityendorsers.com/endorsement/)
    2018-11-22 02:01:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://celebrityendorsers.com/endorsements/playsight/> (referer: https://celebrityendorsers.com/endorsement/)
    2018-11-22 02:01:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://celebrityendorsers.com/endorsements/play-cloths/> (referer: https://celebrityendorsers.com/endorsement/)
    2018-11-22 02:01:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://celebrityendorsers.com/endorsements/platinum-league-trading-cards/> (referer: https://celebrityendorsers.com/endorsement/)
    2018-11-22 02:01:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://celebrityendorsers.com/endorsements/playstation/> (referer: https://celebrityendorsers.com/endorsement/)
    2018-11-22 02:01:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://celebrityendorsers.com/endorsements/platinum-group/> (referer: https://celebrityendorsers.com/endorsement/)


    If you really want to use xpath, then try removing [*].



    The xpath that you commented looks correct, but the callback is wrong, you cannot use the parse callback with a CrawlSpider.






    share|improve this answer




























      0














      Your restrict xpath expression looks wrong.



      You could use the allow parameter instead, this is much easier:



      from scrapy.spiders import CrawlSpider, Rule
      from scrapy.linkextractors import LinkExtractor


      class MySpider(CrawlSpider):

      name = 'celebrityendorsers.com'
      start_urls = ['https://celebrityendorsers.com/endorsement/']

      rules = (
      Rule(LinkExtractor('/endorsements/'), callback='parse_url_contents'),
      )

      def parse_url_contents(self, response):
      pass


      This is the output log:



      2018-11-22 02:01:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://celebrityendorsers.com/endorsements/playtex-wipes/> (referer: https://celebrityendorsers.com/endorsement/)
      2018-11-22 02:01:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://celebrityendorsers.com/endorsements/plenish-cleanse/> (referer: https://celebrityendorsers.com/endorsement/)
      2018-11-22 02:01:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://celebrityendorsers.com/endorsements/play-date-by-sarah-beckham/> (referer: https://celebrityendorsers.com/endorsement/)
      2018-11-22 02:01:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://celebrityendorsers.com/endorsements/playstation-3/> (referer: https://celebrityendorsers.com/endorsement/)
      2018-11-22 02:01:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://celebrityendorsers.com/endorsements/playmg/> (referer: https://celebrityendorsers.com/endorsement/)
      2018-11-22 02:01:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://celebrityendorsers.com/endorsements/playsight/> (referer: https://celebrityendorsers.com/endorsement/)
      2018-11-22 02:01:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://celebrityendorsers.com/endorsements/play-cloths/> (referer: https://celebrityendorsers.com/endorsement/)
      2018-11-22 02:01:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://celebrityendorsers.com/endorsements/platinum-league-trading-cards/> (referer: https://celebrityendorsers.com/endorsement/)
      2018-11-22 02:01:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://celebrityendorsers.com/endorsements/playstation/> (referer: https://celebrityendorsers.com/endorsement/)
      2018-11-22 02:01:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://celebrityendorsers.com/endorsements/platinum-group/> (referer: https://celebrityendorsers.com/endorsement/)


      If you really want to use xpath, then try removing [*].



      The xpath that you commented looks correct, but the callback is wrong, you cannot use the parse callback with a CrawlSpider.






      share|improve this answer


























        0












        0








        0







        Your restrict xpath expression looks wrong.



        You could use the allow parameter instead, this is much easier:



        from scrapy.spiders import CrawlSpider, Rule
        from scrapy.linkextractors import LinkExtractor


        class MySpider(CrawlSpider):

        name = 'celebrityendorsers.com'
        start_urls = ['https://celebrityendorsers.com/endorsement/']

        rules = (
        Rule(LinkExtractor('/endorsements/'), callback='parse_url_contents'),
        )

        def parse_url_contents(self, response):
        pass


        This is the output log:



        2018-11-22 02:01:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://celebrityendorsers.com/endorsements/playtex-wipes/> (referer: https://celebrityendorsers.com/endorsement/)
        2018-11-22 02:01:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://celebrityendorsers.com/endorsements/plenish-cleanse/> (referer: https://celebrityendorsers.com/endorsement/)
        2018-11-22 02:01:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://celebrityendorsers.com/endorsements/play-date-by-sarah-beckham/> (referer: https://celebrityendorsers.com/endorsement/)
        2018-11-22 02:01:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://celebrityendorsers.com/endorsements/playstation-3/> (referer: https://celebrityendorsers.com/endorsement/)
        2018-11-22 02:01:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://celebrityendorsers.com/endorsements/playmg/> (referer: https://celebrityendorsers.com/endorsement/)
        2018-11-22 02:01:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://celebrityendorsers.com/endorsements/playsight/> (referer: https://celebrityendorsers.com/endorsement/)
        2018-11-22 02:01:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://celebrityendorsers.com/endorsements/play-cloths/> (referer: https://celebrityendorsers.com/endorsement/)
        2018-11-22 02:01:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://celebrityendorsers.com/endorsements/platinum-league-trading-cards/> (referer: https://celebrityendorsers.com/endorsement/)
        2018-11-22 02:01:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://celebrityendorsers.com/endorsements/playstation/> (referer: https://celebrityendorsers.com/endorsement/)
        2018-11-22 02:01:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://celebrityendorsers.com/endorsements/platinum-group/> (referer: https://celebrityendorsers.com/endorsement/)


        If you really want to use xpath, then try removing [*].



        The xpath that you commented looks correct, but the callback is wrong, you cannot use the parse callback with a CrawlSpider.






        share|improve this answer













        Your restrict xpath expression looks wrong.



        You could use the allow parameter instead, this is much easier:



        from scrapy.spiders import CrawlSpider, Rule
        from scrapy.linkextractors import LinkExtractor


        class MySpider(CrawlSpider):

        name = 'celebrityendorsers.com'
        start_urls = ['https://celebrityendorsers.com/endorsement/']

        rules = (
        Rule(LinkExtractor('/endorsements/'), callback='parse_url_contents'),
        )

        def parse_url_contents(self, response):
        pass


        This is the output log:



        2018-11-22 02:01:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://celebrityendorsers.com/endorsements/playtex-wipes/> (referer: https://celebrityendorsers.com/endorsement/)
        2018-11-22 02:01:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://celebrityendorsers.com/endorsements/plenish-cleanse/> (referer: https://celebrityendorsers.com/endorsement/)
        2018-11-22 02:01:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://celebrityendorsers.com/endorsements/play-date-by-sarah-beckham/> (referer: https://celebrityendorsers.com/endorsement/)
        2018-11-22 02:01:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://celebrityendorsers.com/endorsements/playstation-3/> (referer: https://celebrityendorsers.com/endorsement/)
        2018-11-22 02:01:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://celebrityendorsers.com/endorsements/playmg/> (referer: https://celebrityendorsers.com/endorsement/)
        2018-11-22 02:01:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://celebrityendorsers.com/endorsements/playsight/> (referer: https://celebrityendorsers.com/endorsement/)
        2018-11-22 02:01:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://celebrityendorsers.com/endorsements/play-cloths/> (referer: https://celebrityendorsers.com/endorsement/)
        2018-11-22 02:01:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://celebrityendorsers.com/endorsements/platinum-league-trading-cards/> (referer: https://celebrityendorsers.com/endorsement/)
        2018-11-22 02:01:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://celebrityendorsers.com/endorsements/playstation/> (referer: https://celebrityendorsers.com/endorsement/)
        2018-11-22 02:01:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://celebrityendorsers.com/endorsements/platinum-group/> (referer: https://celebrityendorsers.com/endorsement/)


        If you really want to use xpath, then try removing [*].



        The xpath that you commented looks correct, but the callback is wrong, you cannot use the parse callback with a CrawlSpider.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 21 '18 at 18:09









        GuillaumeGuillaume

        1,1381724




        1,1381724






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53400090%2fscrapy-debug-crawled-200-and-nothing-return%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            MongoDB - Not Authorized To Execute Command

            How to fix TextFormField cause rebuild widget in Flutter

            in spring boot 2.1 many test slices are not allowed anymore due to multiple @BootstrapWith