Skip to content

PoisonMunna/WebScraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🌐 Professional Web Scraper

Python Version License Platform

A powerful web scraping tool that extracts data from websites, monitors product prices, scrapes job listings, and sends email alerts. Perfect for price tracking, job hunting, and data collection.

Features

  • Price Monitoring - Track product prices and get alerts on drops
  • Job Listing Scraper - Extract job postings from career pages
  • News Headline Scraper - Get latest headlines from news websites
  • Custom Website Scraper - Scrape any website with CSS selectors
  • Email Alerts - Get notifications when prices drop
  • Multiple Output Formats - Save data as JSON or CSV
  • Configurable Settings - JSON-based configuration
  • Continuous Monitoring - Run in background for price tracking
  • User-Friendly CLI - Interactive menu system

Quick Start

Prerequisites

  • Python 3.7 or higher
  • Install required libraries: pip install requests beautifulsoup4

Installation

  1. Clone the repository:
    git clone https://github.com/poisonmunna/webscraper.git
    cd webscraper

  2. Install dependencies:
    pip install requests beautifulsoup4

  3. Run the scraper:
    python WebScraper.py

Interactive Menu

image

============================================================

🌐 PROFESSIONAL WEB SCRAPER

============================================================

πŸ“Œ MAIN MENU

  1. πŸ’° Price Monitor (Track product prices)
  2. πŸ’Ό Job Scraper (Extract job listings)
  3. πŸ“° News Scraper (Get headlines)
  4. πŸ”§ Custom Scraper (Any website)
  5. πŸšͺ Exit

========================================

πŸ‘‰ Enter your choice (1-5): 1

Price Monitor Example

============================================================

πŸ’° PRICE MONITOR MODE

============================================================

Enter product URL: https://www.amazon.com/product
Enter price CSS selector: .price
Enter target price: 299
Enable email alerts? (y/n): y
Your email: your_email@gmail.com
Email password: your_app_password
Recipient email: recipient@gmail.com
Check interval (seconds, default 3600): 3600

πŸš€ Starting price monitor...
Will check every 3600 seconds
Press Ctrl+C to stop

πŸ” Monitoring price for: https://www.amazon.com/product
🎯 Target price: $299
πŸ’° Current price: $289.99
πŸŽ‰ PRICE ALERT! Price is now $289.99 (below $299)
βœ… Price alert email sent!

Job Scraper Example

============================================================

πŸ’Ό JOB LISTING SCRAPER

============================================================

Enter job board URL: https://www.indeed.com/jobs
Enter CSS selectors:
Job container selector: .jobsearch-ResultsList
Job title selector: .jobTitle
Company selector: .companyName
Location selector: .companyLocation
Job link selector: .jobTitle a

πŸ” Scraping job listings...

βœ… Found 25 jobs

  1. Senior Python Developer
    Company: Google
    Location: Remote
    Apply: https://www.indeed.com/job/123

  2. Data Scientist
    Company: Microsoft
    Location: Seattle, WA
    Apply: https://www.indeed.com/job/456

News Scraper Example

image

RESULT

image

============================================================

πŸ“° NEWS HEADLINE SCRAPER

============================================================

Enter news website URL: https://www.bbc.com/news
Enter headline CSS selector: h2
Number of headlines (default 10): 5

πŸ” Scraping headlines...

βœ… Found 5 headlines:

  1. Breaking: New Technology Revolution
  2. Stock Market Hits New High
  3. Scientists Make Major Discovery
  4. Climate Change Summit Begins
  5. Space Mission Successfully Launched

Custom Scraper Example

============================================================

πŸ”§ CUSTOM WEBSITE SCRAPER

============================================================

Enter website URL: https://www.example.com/products

Enter CSS selectors for data extraction:
Data name (or 'done' to finish): title
CSS selector for 'title': .product-title
Data name (or 'done' to finish): price
CSS selector for 'price': .product-price
Data name (or 'done' to finish): description
CSS selector for 'description': .product-desc
Data name (or 'done' to finish): done

πŸ” Scraping data...

βœ… Scraped data:

title: iPhone 14 Pro
price: $999
description: Latest iPhone with advanced features

CSS Selector Guide

What are CSS Selectors?

CSS selectors are patterns used to find HTML elements on a webpage.

Common Selectors

Selector Type Syntax Example
Tag tagname h1, p, div
Class .classname .price, .title
ID #idname #price, #header
Attribute [attribute="value"] [type="text"]
Child parent > child div > .price
Descendant parent child div .price

Finding CSS Selectors

  1. Right-click on any element on a webpage
  2. Select "Inspect"
  3. Element will be highlighted in DevTools
  4. Right-click the HTML code
  5. Copy β†’ Copy selector

Email Configuration

Gmail Setup

  1. Enable 2-Factor Authentication on your Google account

  2. Generate App Password:

  3. When scraper asks:
    Enable email alerts? (y/n): y
    Your email: your_email@gmail.com
    Email password: [paste app password]
    Recipient email: recipient@gmail.com

Common Email Settings

Provider SMTP Server Port
Gmail smtp.gmail.com 587
Outlook smtp.office365.com 587
Yahoo smtp.mail.yahoo.com 587

Configuration File

The scraper uses scraper_config.json for settings:

{
"targets": [
{
"name": "Product Monitor",
"url": "https://example.com/product",
"selectors": {
"price": ".price"
},
"type": "static"
}
],
"email_settings": {
"enabled": true,
"smtp_server": "smtp.gmail.com",
"smtp_port": 587,
"sender_email": "your_email@gmail.com",
"sender_password": "your_app_password",
"recipient_email": "recipient@gmail.com"
},
"scrape_interval": 3600,
"output_format": "json"
}

Command Line Usage

Command Description
python web_scraper.py Run in interactive mode
python web_scraper.py --mode price Run price monitor directly
python web_scraper.py --mode jobs Run job scraper directly
python web_scraper.py --mode news Run news scraper directly
python web_scraper.py --mode custom Run custom scraper directly
python web_scraper.py --config my_config.json Use custom config file

Output Formats

JSON Output

{
"title": "iPhone 14 Pro",
"price": "$999",
"timestamp": "2024-06-16T10:30:00"
}

CSV Output

title,price,timestamp
iPhone 14 Pro,$999,2024-06-16T10:30:00

Folder Structure

web-scraper/
β”‚
β”œβ”€β”€ web_scraper.py # Main script
β”œβ”€β”€ scraper_config.json # Configuration file
β”œβ”€β”€ previous_price.json # Tracked prices
β”œβ”€β”€ scraped_data_.json # Scraped data
β”œβ”€β”€ scraped_data_
.csv # Scraped data
└── README.md

Troubleshooting

Error: No module named 'requests'

Solution: pip install requests

Error: No module named 'bs4'

Solution: pip install beautifulsoup4

Email not sending

  • Check app password (not regular password)
  • Enable 2FA on Google account
  • Check SMTP server settings

CSS Selector not finding elements

  • Verify selector in browser DevTools
  • Check if website loads with JavaScript
  • Try different selector (more specific)

Rate Limiting

  • Add delays between requests
  • Use rotating user agents
  • Respect robots.txt

Safety Tips

  • Respect website terms of service
  • Use reasonable request intervals
  • Don't overload servers
  • Check robots.txt first
  • Use user-agent rotation
  • Implement error handling

Contributing

Contributions are welcome!

  1. Fork the repository
  2. Create a feature branch
  3. Commit your changes
  4. Push to the branch
  5. Open a Pull Request

License

Distributed under the MIT License. See LICENSE file for more information.

Contact

Your Name - 123razz321@gmail.com

Project Link: https://github.com/poisonmunna/webscraper

Show Your Support

If this project helped you scrape data effectively, please give it a star on GitHub!


Made with Python | Extract Data, Get Insights! 🌐

About

A powerful web scraping tool that extracts data from websites, monitors product prices, scrapes job listings, and sends email alerts. Perfect for price tracking, job hunting, and data collection.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages