🌐 Professional Web Scraper

A powerful web scraping tool that extracts data from websites, monitors product prices, scrapes job listings, and sends email alerts. Perfect for price tracking, job hunting, and data collection.

Features

Price Monitoring - Track product prices and get alerts on drops
Job Listing Scraper - Extract job postings from career pages
News Headline Scraper - Get latest headlines from news websites
Custom Website Scraper - Scrape any website with CSS selectors
Email Alerts - Get notifications when prices drop
Multiple Output Formats - Save data as JSON or CSV
Configurable Settings - JSON-based configuration
Continuous Monitoring - Run in background for price tracking
User-Friendly CLI - Interactive menu system

Quick Start

Prerequisites

Python 3.7 or higher
Install required libraries: pip install requests beautifulsoup4

Installation

Clone the repository:
git clone https://github.com/poisonmunna/webscraper.git
cd webscraper
Install dependencies:
pip install requests beautifulsoup4
Run the scraper:
python WebScraper.py

Interactive Menu

============================================================

🌐 PROFESSIONAL WEB SCRAPER

============================================================

📌 MAIN MENU

💰 Price Monitor (Track product prices)
💼 Job Scraper (Extract job listings)
📰 News Scraper (Get headlines)
🔧 Custom Scraper (Any website)
🚪 Exit

========================================

👉 Enter your choice (1-5): 1

Price Monitor Example

============================================================

💰 PRICE MONITOR MODE

============================================================

Enter product URL: https://www.amazon.com/product
Enter price CSS selector: .price
Enter target price: 299
Enable email alerts? (y/n): y
Your email: your_email@gmail.com
Email password: your_app_password
Recipient email: recipient@gmail.com
Check interval (seconds, default 3600): 3600

🚀 Starting price monitor...
Will check every 3600 seconds
Press Ctrl+C to stop

🔍 Monitoring price for: https://www.amazon.com/product
🎯 Target price: $299
💰 Current price: $289.99
🎉 PRICE ALERT! Price is now $289.99 (below $299)
✅ Price alert email sent!

Job Scraper Example

============================================================

💼 JOB LISTING SCRAPER

============================================================

Enter job board URL: https://www.indeed.com/jobs
Enter CSS selectors:
Job container selector: .jobsearch-ResultsList
Job title selector: .jobTitle
Company selector: .companyName
Location selector: .companyLocation
Job link selector: .jobTitle a

🔍 Scraping job listings...

✅ Found 25 jobs

Senior Python Developer
Company: Google
Location: Remote
Apply: https://www.indeed.com/job/123
Data Scientist
Company: Microsoft
Location: Seattle, WA
Apply: https://www.indeed.com/job/456

News Scraper Example

RESULT

============================================================

📰 NEWS HEADLINE SCRAPER

============================================================

Enter news website URL: https://www.bbc.com/news
Enter headline CSS selector: h2
Number of headlines (default 10): 5

🔍 Scraping headlines...

✅ Found 5 headlines:

Breaking: New Technology Revolution
Stock Market Hits New High
Scientists Make Major Discovery
Climate Change Summit Begins
Space Mission Successfully Launched

Custom Scraper Example

============================================================

🔧 CUSTOM WEBSITE SCRAPER

============================================================

Enter website URL: https://www.example.com/products

Enter CSS selectors for data extraction:
Data name (or 'done' to finish): title
CSS selector for 'title': .product-title
Data name (or 'done' to finish): price
CSS selector for 'price': .product-price
Data name (or 'done' to finish): description
CSS selector for 'description': .product-desc
Data name (or 'done' to finish): done

🔍 Scraping data...

✅ Scraped data:

title: iPhone 14 Pro
price: $999
description: Latest iPhone with advanced features

CSS Selector Guide

What are CSS Selectors?

CSS selectors are patterns used to find HTML elements on a webpage.

Common Selectors

Selector Type	Syntax	Example
Tag	tagname	h1, p, div
Class	.classname	.price, .title
ID	#idname	#price, #header
Attribute	[attribute="value"]	[type="text"]
Child	parent > child	div > .price
Descendant	parent child	div .price

Finding CSS Selectors

Right-click on any element on a webpage
Select "Inspect"
Element will be highlighted in DevTools
Right-click the HTML code
Copy → Copy selector

Email Configuration

Gmail Setup

Enable 2-Factor Authentication on your Google account
Generate App Password:
- Go to: https://myaccount.google.com/apppasswords
- Select "Mail" as app
- Select "Other" as device
- Copy the 16-character password
When scraper asks:
Enable email alerts? (y/n): y
Your email: your_email@gmail.com
Email password: [paste app password]
Recipient email: recipient@gmail.com

Common Email Settings

Provider	SMTP Server	Port
Gmail	smtp.gmail.com	587
Outlook	smtp.office365.com	587
Yahoo	smtp.mail.yahoo.com	587

Configuration File

The scraper uses scraper_config.json for settings:

{
"targets": [
{
"name": "Product Monitor",
"url": "https://example.com/product",
"selectors": {
"price": ".price"
},
"type": "static"
}
],
"email_settings": {
"enabled": true,
"smtp_server": "smtp.gmail.com",
"smtp_port": 587,
"sender_email": "your_email@gmail.com",
"sender_password": "your_app_password",
"recipient_email": "recipient@gmail.com"
},
"scrape_interval": 3600,
"output_format": "json"
}

Command Line Usage

Command	Description
python web_scraper.py	Run in interactive mode
python web_scraper.py --mode price	Run price monitor directly
python web_scraper.py --mode jobs	Run job scraper directly
python web_scraper.py --mode news	Run news scraper directly
python web_scraper.py --mode custom	Run custom scraper directly
python web_scraper.py --config my_config.json	Use custom config file

Output Formats

JSON Output

{
"title": "iPhone 14 Pro",
"price": "$999",
"timestamp": "2024-06-16T10:30:00"
}

CSV Output

title,price,timestamp
iPhone 14 Pro,$999,2024-06-16T10:30:00

Folder Structure

web-scraper/
│
├── web_scraper.py # Main script
├── scraper_config.json # Configuration file
├── previous_price.json # Tracked prices
├── scraped_data_.json # Scraped data
├── scraped_data_.csv # Scraped data
└── README.md

Troubleshooting

Error: No module named 'requests'

Solution: pip install requests

Error: No module named 'bs4'

Solution: pip install beautifulsoup4

Email not sending

Check app password (not regular password)
Enable 2FA on Google account
Check SMTP server settings

CSS Selector not finding elements

Verify selector in browser DevTools
Check if website loads with JavaScript
Try different selector (more specific)

Rate Limiting

Add delays between requests
Use rotating user agents
Respect robots.txt

Safety Tips

Respect website terms of service
Use reasonable request intervals
Don't overload servers
Check robots.txt first
Use user-agent rotation
Implement error handling

Contributing

Contributions are welcome!

Fork the repository
Create a feature branch
Commit your changes
Push to the branch
Open a Pull Request

License

Distributed under the MIT License. See LICENSE file for more information.

Contact

Your Name - 123razz321@gmail.com

Project Link: https://github.com/poisonmunna/webscraper

Show Your Support

If this project helped you scrape data effectively, please give it a star on GitHub!

Made with Python | Extract Data, Get Insights! 🌐

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Screenshots		Screenshots
README.md		README.md
WebScraper.py		WebScraper.py
scraper_config.json		scraper_config.json

Folders and files

Latest commit

History

Repository files navigation

🌐 Professional Web Scraper

Features

Quick Start

Prerequisites

Installation

Interactive Menu

📌 MAIN MENU

Price Monitor Example

Job Scraper Example

News Scraper Example

RESULT

Custom Scraper Example

CSS Selector Guide

What are CSS Selectors?

Common Selectors

Finding CSS Selectors

Email Configuration

Gmail Setup

Common Email Settings

Configuration File

Command Line Usage

Output Formats

JSON Output

CSV Output

Folder Structure

Troubleshooting

Error: No module named 'requests'

Error: No module named 'bs4'

Email not sending

CSS Selector not finding elements

Rate Limiting

Safety Tips

Contributing

License

Contact

Show Your Support

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages