Exotic Amazon (Chinese mirror: exotic-amazon) is a complete solution for crawling the entire amazon.com website, ready to use out of the box, containing most data types of Amazon, and it will be permanently provided for free and open source.
The methods and processes for data collection of other e-commerce platforms are basically similar. You can modify and adjust the business logic based on this project, and its infrastructure solves all the difficulties faced by large-scale data collection.
Thanks to the comprehensive Web data management infrastructure provided by PulsarRPA, the entire solution consists of no more than 3500 lines of Kotlin code and less than 700 lines of X-SQL to extract more than 650 fields.
- Best Seller - Updated daily, about 32,000 categories, about 4,000,000 product records
- Most Wished For - Updated daily, about 25,000 categories, about 3,500,000 product records
- New Releases - Updated daily, about 25,000 categories, about 3,000,000 product records
- Movers and Shakers - About 20 categories, updated every hour
- Products - About 20,000,000 products, updated monthly
- 70+ fields
- Titles, prices, inventory, images, descriptions, specifications, stores, etc.
- Sponsored products, similar products, related products, etc.
- Hot reviews, etc.
- Review - Updated daily
git clone https://github.com/platonai/exotic-amazon.git
cd exotic-amazon && mvn
java -jar target/exotic-amazon*.jar
# Or on Windows:
java -jar target/exotic-amazon-{the-actual-version}.jar
Open System Glances to get a clear view of the system status.
All extraction rules (Chinese mirror: exotic-amazon) are written in X-SQL. Data type conversion and data cleaning are also handled by powerful X-SQL inline processing, which is an important reason why we developed X-SQL. A good example of X-SQL is x-asin.sql (Chinese mirror: exotic-amazon), which extracts more than 70 fields from each product page.
By default, results are written in json format to the local file system.
There are several ways to save results to the database:
- Serialize the results as key-value pairs and save them as a field of the WebPage object, which is the core data structure of the entire system and this feature is also enabled by default.
- Write the results to a JDBC-compatible database, such as MySQL, PostgreSQL, MS SQL Server, Oracle, etc.
- Write a few lines of code to save the results to any destination you wish.