A single-page application comic site built using reactjs + python/flask + sqlite + scrapy also uses gunicorn and fabric's Python management and deployment tools. It was used when I built the automatic deployment myself.



Precondition node.js, python3
npm install
pip install -r requirements.txt
cd server python web_server.py
npm start
Visit localhost:3000 over...
soul_manga_spider.py defines three ways of crawling. REQ_TYPE corresponds to different url types: single comics, all comics on a single page, and all comics. There is also an is_update parameter to indicate whether only the recently updated page URL is crawled and then do incremental updates. When I deployed it myself, it was basically enough to use crontab to crawl once every 12 hours. The default is_update is false, and REQ_TYPE is default to do nothing, and the default is to use the db I have crawled. The log level adjusts LOG_LEVEL and LOG_FILE of setting.py according to your needs