Skip to content

Argu333/Scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ•·οΈ Web Scraper & Crawler in Python

A Python-based web scraper and crawler designed to extract structured data from various websites β€” including those with anti-scraping techniques like custom headers and CSRF protection.


🌐 Target Websites


πŸ› οΈ Features

  • βœ… Custom headers to bypass basic anti-bot detection
  • βœ… Automatic pagination support
  • βœ… CSRF token retrieval and session-based login handling
  • βœ… Output data to JSON

πŸ“ Project Structure

Scraper/
β”œβ”€β”€ Sync/                     # Synchronous scraping modules
β”‚   β”œβ”€β”€ Categories.py         # Gets all the books categories
β”‚   β”œβ”€β”€ NamePrice.py          # Extracts book names and prices
β”‚   └── Total.py              # Extracts book names and prices as per their categories
β”‚
β”œβ”€β”€ async.py                  # Asynchronous scraping module
β”œβ”€β”€ header.py                 # Manages headers/user-agents
β”œβ”€β”€ Login.py                  # Handles login/authentication
└── Scrape.json               # Scraping output for "async.py"

🧰 Tech Stack


βš™οΈ Setup Instructions

  1. Clone the Repository
git clone https://github.com/Argu333/Scraper.git
cd Scraper
  1. Install the used libraries (if not installed)
pip install requests
pip install beautifulsoup4
pip install aiohttp

About

A simple web scraper and crawler made with python

Resources

Stars

Watchers

Forks

Contributors

Languages