This project is a Python-based web scraper designed to extract movie-related information from The Movie Database (TMDB). Using libraries like requests
and BeautifulSoup
, it collects data such as movie titles, ratings, genres, and cast details. The extracted data is organized into structured formats using Pandas and exported to a CSV file for further analysis.
- Web Scraping: Extracts movie details from multiple pages of the TMDB website.
- Data Storage: Combines data into Pandas DataFrames and exports as CSV.
- Error Handling: Implements robust mechanisms for handling request failures.
- Reusable Functions: Includes modular user-defined functions for easy extensibility.
Ensure you have the following installed:
- Python 3.7+
- Pip (Python package manager)
-
Clone this repository:
git clone <repository_url> cd tmdb-movie-data-scraper
-
Install the required Python libraries:
pip install -r requirements.txt
-
Run the script:
python main.py
The script fetches data from the first 6 pages of TMDB and combines the results into a single CSV file.
You can customize the number of pages to scrape or adjust headers by editing the main.py
script.
The combined movie data is saved as Combined_Data.csv
in the project directory.
- CSV File: Contains the following columns:
- Title
- Rating
- Genre(s)
- Cast
Example output:
Title | Rating | Genres | Cast |
---|---|---|---|
The Shawshank... | 9.3 | Drama, Crime | Tim Robbins, ... |
The Godfather | 9.2 | Drama, Crime | Marlon Brando,... |
- Python
- Requests
- BeautifulSoup
- Pandas
Contributions are welcome! Please follow these steps:
- Fork the repository.
- Create a feature branch.
- Submit a pull request.
This project is licensed under the MIT License. See the LICENSE file for details.
- The Movie Database (TMDB) for providing the data.