Crawler-OpenSource-List

Crawler OpenSource List | 爬虫开源框架索引

2019-SpiderFlow : 新一代爬虫平台，以图形化方式定义爬虫流程，不写代码即可完成爬虫。
2020-Crawlab : Distributed web crawler admin platform for spiders management regardless of languages and frameworks.

x-ray : The next web scraper. See through the noise.
headless-chrome-crawler : Distributed crawler powered by Headless Chrome.
apify-js : Apify SDK — The scalable web scraping and crawling library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
2021-Crawlee : A web scraping and browser automation library for Node.js that helps you build reliable crawlers. Fast.

2018-Scrapy : Scrapy is a fast high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.
2019-pyspider : A Powerful Spider(Web Crawler) System in Python.
Photon : Incredibly fast crawler which extracts urls, emails, files, website accounts and much more.
Gerapy : Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Scrapyd-Client, Scrapyd-API, Django and Vue.js.

2015-go_spider : An awesome Go concurrent Crawler(spider) framework. The crawler is flexible and modular. It can be expanded to an Individualized crawler easily or you can use the default crawl components only.
2017-Colly : Lightning Fast and Elegant Scraping Framework for Gophers.
2018-ferret : ferret is a web scraping system aiming to simplify data extraction from the web for such things like UI testing, machine learning and analytics.
2019-Hakrawler : Simple, fast web crawler designed for easy, quick discovery of endpoints and assets within a web application.
2022-katana : A next-generation crawling and spidering framework.

Crawler4j : crawler4j is an open source web crawler for Java which provides a simple interface for crawling the Web. Using it, you can setup a multi-threaded web crawler in few minutes.
2015-WebMagic : A scalable crawler framework. It covers the whole lifecycle of crawler: downloading, url management, content extraction and persistent. It can simplify the development of a specific crawler.

2023-EasySpider : A visual no-code/code-free web crawler/spider 一个可视化爬虫软件，可以无代码图形化设计和执行的爬虫任务

最近更新于 2023-04-16