cd ~/PycharmProjects/scrapy

####创建scrappy项目

scrapy startproject weibo
  • 进入weibo项目路径
cd weibo
  • 设置爬取限制的域名范围
scrapy genspider weibo weibo.com

配置

settings.py

#设置日志等级
LOG_LEVEL = "WARNING"
# Crawl responsibly by identifying yourself (and your website) on the user-agent
USER_AGENT = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.97 Safari/537.36'
#开启pipeline
# Configure item pipelines
# See https://docs.scrapy.org/en/latest/topics/item-pipeline.html
ITEM_PIPELINES = {
   'yangguangovnment.pipelines.YangguangovnmentPipeline': 300,
}

创建crawlSpdier爬虫

 scrapy genspider -t crawl cf circ.gov.cn