A powerful web scraping tool that extracts data from websites, monitors product prices, scrapes job listings, and sends email alerts. Perfect for price tracking, job hunting, and data collection.
- Price Monitoring - Track product prices and get alerts on drops
- Job Listing Scraper - Extract job postings from career pages
- News Headline Scraper - Get latest headlines from news websites
- Custom Website Scraper - Scrape any website with CSS selectors
- Email Alerts - Get notifications when prices drop
- Multiple Output Formats - Save data as JSON or CSV
- Configurable Settings - JSON-based configuration
- Continuous Monitoring - Run in background for price tracking
- User-Friendly CLI - Interactive menu system
- Python 3.7 or higher
- Install required libraries: pip install requests beautifulsoup4
-
Clone the repository:
git clone https://github.com/poisonmunna/webscraper.git
cd webscraper -
Install dependencies:
pip install requests beautifulsoup4 -
Run the scraper:
python WebScraper.py
============================================================
π PROFESSIONAL WEB SCRAPER
============================================================
- π° Price Monitor (Track product prices)
- πΌ Job Scraper (Extract job listings)
- π° News Scraper (Get headlines)
- π§ Custom Scraper (Any website)
- πͺ Exit
========================================
π Enter your choice (1-5): 1
============================================================
π° PRICE MONITOR MODE
============================================================
Enter product URL: https://www.amazon.com/product
Enter price CSS selector: .price
Enter target price: 299
Enable email alerts? (y/n): y
Your email: your_email@gmail.com
Email password: your_app_password
Recipient email: recipient@gmail.com
Check interval (seconds, default 3600): 3600
π Starting price monitor...
Will check every 3600 seconds
Press Ctrl+C to stop
π Monitoring price for: https://www.amazon.com/product
π― Target price: $299
π° Current price: $289.99
π PRICE ALERT! Price is now $289.99 (below $299)
β
Price alert email sent!
============================================================
πΌ JOB LISTING SCRAPER
============================================================
Enter job board URL: https://www.indeed.com/jobs
Enter CSS selectors:
Job container selector: .jobsearch-ResultsList
Job title selector: .jobTitle
Company selector: .companyName
Location selector: .companyLocation
Job link selector: .jobTitle a
π Scraping job listings...
β Found 25 jobs
-
Senior Python Developer
Company: Google
Location: Remote
Apply: https://www.indeed.com/job/123 -
Data Scientist
Company: Microsoft
Location: Seattle, WA
Apply: https://www.indeed.com/job/456
============================================================
π° NEWS HEADLINE SCRAPER
============================================================
Enter news website URL: https://www.bbc.com/news
Enter headline CSS selector: h2
Number of headlines (default 10): 5
π Scraping headlines...
β Found 5 headlines:
- Breaking: New Technology Revolution
- Stock Market Hits New High
- Scientists Make Major Discovery
- Climate Change Summit Begins
- Space Mission Successfully Launched
============================================================
π§ CUSTOM WEBSITE SCRAPER
============================================================
Enter website URL: https://www.example.com/products
Enter CSS selectors for data extraction:
Data name (or 'done' to finish): title
CSS selector for 'title': .product-title
Data name (or 'done' to finish): price
CSS selector for 'price': .product-price
Data name (or 'done' to finish): description
CSS selector for 'description': .product-desc
Data name (or 'done' to finish): done
π Scraping data...
β Scraped data:
title: iPhone 14 Pro
price: $999
description: Latest iPhone with advanced features
CSS selectors are patterns used to find HTML elements on a webpage.
| Selector Type | Syntax | Example |
|---|---|---|
| Tag | tagname | h1, p, div |
| Class | .classname | .price, .title |
| ID | #idname | #price, #header |
| Attribute | [attribute="value"] | [type="text"] |
| Child | parent > child | div > .price |
| Descendant | parent child | div .price |
- Right-click on any element on a webpage
- Select "Inspect"
- Element will be highlighted in DevTools
- Right-click the HTML code
- Copy β Copy selector
-
Enable 2-Factor Authentication on your Google account
-
Generate App Password:
- Go to: https://myaccount.google.com/apppasswords
- Select "Mail" as app
- Select "Other" as device
- Copy the 16-character password
-
When scraper asks:
Enable email alerts? (y/n): y
Your email: your_email@gmail.com
Email password: [paste app password]
Recipient email: recipient@gmail.com
| Provider | SMTP Server | Port |
|---|---|---|
| Gmail | smtp.gmail.com | 587 |
| Outlook | smtp.office365.com | 587 |
| Yahoo | smtp.mail.yahoo.com | 587 |
The scraper uses scraper_config.json for settings:
{
"targets": [
{
"name": "Product Monitor",
"url": "https://example.com/product",
"selectors": {
"price": ".price"
},
"type": "static"
}
],
"email_settings": {
"enabled": true,
"smtp_server": "smtp.gmail.com",
"smtp_port": 587,
"sender_email": "your_email@gmail.com",
"sender_password": "your_app_password",
"recipient_email": "recipient@gmail.com"
},
"scrape_interval": 3600,
"output_format": "json"
}
| Command | Description |
|---|---|
| python web_scraper.py | Run in interactive mode |
| python web_scraper.py --mode price | Run price monitor directly |
| python web_scraper.py --mode jobs | Run job scraper directly |
| python web_scraper.py --mode news | Run news scraper directly |
| python web_scraper.py --mode custom | Run custom scraper directly |
| python web_scraper.py --config my_config.json | Use custom config file |
{
"title": "iPhone 14 Pro",
"price": "$999",
"timestamp": "2024-06-16T10:30:00"
}
title,price,timestamp
iPhone 14 Pro,$999,2024-06-16T10:30:00
web-scraper/
β
βββ web_scraper.py # Main script
βββ scraper_config.json # Configuration file
βββ previous_price.json # Tracked prices
βββ scraped_data_.json # Scraped data
βββ scraped_data_.csv # Scraped data
βββ README.md
Solution: pip install requests
Solution: pip install beautifulsoup4
- Check app password (not regular password)
- Enable 2FA on Google account
- Check SMTP server settings
- Verify selector in browser DevTools
- Check if website loads with JavaScript
- Try different selector (more specific)
- Add delays between requests
- Use rotating user agents
- Respect robots.txt
- Respect website terms of service
- Use reasonable request intervals
- Don't overload servers
- Check robots.txt first
- Use user-agent rotation
- Implement error handling
Contributions are welcome!
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Open a Pull Request
Distributed under the MIT License. See LICENSE file for more information.
Your Name - 123razz321@gmail.com
Project Link: https://github.com/poisonmunna/webscraper
If this project helped you scrape data effectively, please give it a star on GitHub!
Made with Python | Extract Data, Get Insights! π


