Scrapy教程

Scrapy 爬取

说明

要执行您的爬虫,请在您的 first_scrapy 目录中运行以下命令-
scrapy crawl first
其中, first 是创建爬虫时指定的爬虫的名称。
爬虫爬行后,您可以看到以下输出-
2016-08-09 18:13:07-0400 [scrapy] INFO: Scrapy started (bot: tutorial)
2016-08-09 18:13:07-0400 [scrapy] INFO: Optional features available: ...
2016-08-09 18:13:07-0400 [scrapy] INFO: Overridden settings: {}
2016-08-09 18:13:07-0400 [scrapy] INFO: Enabled extensions: ...
2016-08-09 18:13:07-0400 [scrapy] INFO: Enabled downloader middlewares: ...
2016-08-09 18:13:07-0400 [scrapy] INFO: Enabled spider middlewares: ...
2016-08-09 18:13:07-0400 [scrapy] INFO: Enabled item pipelines: ...
2016-08-09 18:13:07-0400 [scrapy] INFO: Spider opened
2016-08-09 18:13:08-0400 [scrapy] DEBUG: Crawled (200) 
<GET http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/> (referer: None)
2016-08-09 18:13:09-0400 [scrapy] DEBUG: Crawled (200) 
<GET http://www.dmoz.org/Computers/Programming/Languages/Python/Books/> (referer: None)
2016-08-09 18:13:09-0400 [scrapy] INFO: Closing spider (finished)
正如您在输出中看到的,对于每个 URL,都有一个日志行, (referer: None) 声明这些 URL 是起始 URL,并且它们没有引用。接下来,您应该会看到在 first_scrapy 目录中创建了两个名为 Books.htmlResources.html 的新文件。
昵称: 邮箱:
Copyright © 2022 立地货 All Rights Reserved.
备案号:京ICP备14037608号-4