Project information
- PL: Python
- Libraries: requests, csv
- Skills: Web Scraping
- Project date: 2024
- Project URL: GitHub
Overveiw
This project is a Python-based web scraper designed to extract reviews and feedback from AliExpress. It uses proxy rotation to avoid rate-limiting and provides functionalities for handling CSV files, among other features. The .csv dataset resulted by this webscraper have several potential use cases like EDA and Sentement Analysis to Determine the overall sentiment from customer reviews. This can help gauge customer satisfaction, identify common issues, and understand product performance.
Methodology
The approach I toke for this project is to:
- Retrieve HTTP headers for web scraping.
- Generate dynamic parameters for request URLs.
- Rotate through a list of HTTP proxies.
- Check if proxies work with external websites (e.g., Amazon).
- Save extracted data to CSV files, with conditional creation of the CSV file and headers.