[settings] shell = bpython
scrapy shell <url>
快捷方式和说明 |
shelp()
它通过帮助选项提供可用的对象和快捷方式。
|
fetch(request_or_url)
它收集来自请求或 URL 的响应,并且关联的对象将得到正确更新。
|
view(response)
您可以在本地浏览器中查看给定请求的响应以进行观察并正确显示外部链接,它会附加一个响应正文的基本标记。
|
对象和描述 |
crawler
指定当前的爬虫对象。
|
spider
如果当前 URL 没有爬虫,那么它会通过定义新的爬虫来处理 URL 或爬虫对象。
|
request
指定最后收集的页面的请求对象。
|
response
它指定了最后一个收集页面的响应对象。
|
settings
它提供了当前的 Scrapy 设置。
|
scrapy shell 'http://scrapy.org'--nolog
[s] Available Scrapy objects: [s] crawler <scrapy.crawler.Crawler object at 0x1e16b50> [s] item {} [s] request <GET http://scrapy.org > [s] response <200 http://scrapy.org > [s] settings <scrapy.settings.Settings object at 0x2bfd650> [s] spider <Spider 'default' at 0x20c6f50> [s] Useful shortcuts: [s] shelp() Provides available objects and shortcuts with help option [s] fetch(req_or_url) Collects the response from the request or URL and associated objects will get update [s] view(response) View the response for the given request
>> response.xpath('//title/text()').extract_first() u'Scrapy | A Fast and Powerful Scraping and Web Crawling Framework' >> fetch("http://reddit.com") [s] Available Scrapy objects: [s] crawler [s] item {} [s] request [s] response <200 https://www.reddit.com/> [s] settings [s] spider [s] Useful shortcuts: [s] shelp() Shell help (print this help) [s] fetch(req_or_url) Fetch request (or URL) and update local objects [s] view(response) View response in a browser >> response.xpath('//title/text()').extract() [u'reddit: the front page of the internet'] >> request = request.replace(method="POST") >> fetch(request) [s] Available Scrapy objects: [s] crawler ...
import scrapy class SpiderDemo(scrapy.Spider): name = "spiderdemo" start_urls = [ "http://mysite.com", "http://mysite1.org", "http://mysite2.net", ] def parse(self, response): # You can inspect one specific response if ".net" in response.url: from scrapy.shell import inspect_response inspect_response(response, self)
scrapy.shell.inspect_response
2016-02-08 18:15:20-0400 [scrapy] DEBUG: Crawled (200) (referer: None)
2016-02-08 18:15:20-0400 [scrapy] DEBUG: Crawled (200) (referer: None)
2016-02-08 18:15:20-0400 [scrapy] DEBUG: Crawled (200) (referer: None)
[s] Available Scrapy objects:
[s] crawler
...
>> response.url
'http://mysite2.org'
>> response.xpath('//div[@class = "val"]')
[]
>> view(response)
True