Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Scrapyd-Client, Scrapyd-API, Django and Vue.js. Someone who has worked as a crawler with Python may use Scrapy. Scrapy is indeed a very powerful crawler framework. It has high crawling efficiency and good scalability. It is basically a necessary tool for developing crawlers using Python. If you use Scrapy as a crawler, then of course we can use our own host to crawl when crawling, but when the crawl is very large, we can’t run the crawler on our own machine, a good one. The method is to deploy Scrapy to a remote server for execution. At this time, you might use Scrapyd. With it, we only need to install Scrapyd on the remote server and start the service. We can deploy the Scrapy project we wrote. Go to the remote host. In addition, Scrapyd provides a variety of operations API, which gives you free control over the operation of the Scrapy project.
Features
- Gerapy is developed based on Python 3.x. Python 2.x may be supported later
- Install Gerapy by pip
- In Gerapy, You can create a configurable project and then configure and generate code of Scrapy automatically
- You can drag your Scrapy Project to projects folder
- After the deployment, you can manage the job in Monitor Page
- You can edit your projects through the web page