scrapy初始化爬虫
cd ~/PycharmProjects/scrapy
####创建scrappy项目
scrapy startproject weibo
- 进入weibo项目路径
cd weibo
- 设置爬取限制的域名范围
scrapy genspider weibo weibo.com
配置
settings.py
#设置日志等级
LOG_LEVEL = "WARNING"
# Crawl responsibly by identifying yourself (and your website) on the user-agent
USER_AGENT = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.97 Safari/537.36'
#开启pipeline
# Configure item pipelines
# See https://docs.scrapy.org/en/latest/topics/item-pipeline.html
ITEM_PIPELINES = {
'yangguangovnment.pipelines.YangguangovnmentPipeline': 300,
}
创建crawlSpdier爬虫
scrapy genspider -t crawl cf circ.gov.cn
既已览卷至此,何不品评一二: