Skip to content

MohsinN05/WealthLab

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

💰 WEALTHLAB-AI

AI-Powered Financial Intelligence & Banking Analytics Platform

WEALTHLAB-AI is an end-to-end financial analytics platform built on real banking data from Caixabank Tech's 2024 AI Hackathon. It combines machine learning, time-series forecasting, and interactive visualizations to deliver fraud detection, customer segmentation, financial health scoring, and expense forecasting.

Dataset: Transactions Fraud Datasets — Kaggle


📁 Project Structure

WEALTHLAB-AI/
├── Data/
│   ├── Raw/                        ← Place original dataset files here
│   │   ├── transactions_data.csv
│   │   ├── cards_dat.csv
│   │   ├── users_data.csv
│   │   ├── train_fraud_labels.json
│   │   └── mcc_codes.json
│   └── Processed/                  ← Cleaned CSVs saved here
│       ├── transactions_cleaned.csv
│       ├── cards_cleaned.csv
│       ├── users_cleaned.csv
│       ├── fraud_labels_cleaned.csv
│       └── mcc_codes_cleaned.csv
├── Database/
│   └── wealthlab.duckdb            ← DuckDB persistent database
├── Models/
│   ├── fraud_xgb.pkl               ← Trained XGBoost fraud model
│   ├── fraud_scaler.pkl            ← Feature scaler
│   └── fraud_threshold.pkl         ← Optimal prediction threshold
├── notebooks/
│   ├── transactions_cleaning.ipynb
│   ├── cards_cleaning.ipynb
│   ├── users_cleaning.ipynb
│   ├── fraud_labels_cleaning.ipynb
│   ├── mcc_codes_cleaning.ipynb
│   ├── merging_and_integration.ipynb
│   ├── feature_engineering.ipynb
│   ├── customer_segmentation.ipynb
│   ├── fraud_detection.ipynb
│   ├── financial_health_scoring.ipynb
│   ├── recommendation_engine.ipynb
│   └── expense_forecasting.ipynb
├── app.py                          ← Streamlit dashboard
├── requirements.txt
└── README.md

🚀 Getting Started

Prerequisites

  • Python 3.11.9

1. Clone the repository

git clone https://github.com/yourusername/WEALTHLAB-AI.git
cd WEALTHLAB-AI

2. Create a virtual environment

python -m venv venv
venv\Scripts\activate        # Windows
source venv/bin/activate     # Mac/Linux

3. Install dependencies

pip install -r requirements.txt

4. Add the dataset

Download the dataset from Kaggle and place all files inside Data/Raw/.

5. Run the notebooks in order

Run each notebook inside notebooks/ in the following order:

  1. transactions_cleaning.ipynb
  2. cards_cleaning.ipynb
  3. users_cleaning.ipynb
  4. fraud_labels_cleaning.ipynb
  5. mcc_codes_cleaning.ipynb
  6. merging_and_integration.ipynb
  7. feature_engineering.ipynb
  8. customer_segmentation.ipynb
  9. fraud_detection.ipynb
  10. financial_health_scoring.ipynb
  11. recommendation_engine.ipynb
  12. expense_forecasting.ipynb

6. Launch the dashboard

streamlit run app.py

🧠 Modules

1. Data Engineering

Cleans and merges all 5 raw files into a single master table stored in DuckDB. Handles dollar sign formatting, null values, negative amounts, and online transaction anomalies.

2. Customer Segmentation

K-Means clustering on 1219 customers using behavioral features. Produces 4 segments: Low Debt Stable, High Debt Spenders, Digital Active Users, and High Risk.

3. Fraud Detection

Compares two models:

  • Autoencoder — unsupervised anomaly detection, ROC-AUC: 0.77
  • XGBoost — supervised classification, ROC-AUC: 0.85 ✅

Final model: XGBoost with threshold 0.3, achieving 86% fraud recall on 13M transactions.

4. Financial Health Scoring

Custom weighted formula combining savings ratio, credit score, debt-to-income ratio, and fraud history. Scores customers from 0–100 across four labels: Excellent, Stable, Moderate Risk, Financially Vulnerable.

5. Recommendation Engine

Rule-based system generating personalized financial advice based on health score, segment, debt ratio, credit utilization, and savings potential.

6. Expense Forecasting

Compares two forecasting models:

  • Prophet — MAPE: 13.37% ✅
  • ARIMA — MAPE: 18.62%

Final model: Prophet with yearly seasonality, forecasting 6 months ahead per customer.


📊 Dashboard Pages

Page Description
Overview Key metrics, spending trends, transaction distribution
Customer Analytics Segments, credit scores, income vs debt, category spending
Fraud Intelligence Fraud trends, live prediction, fraud by hour/category/type
Financial Health Health scores, distributions, recommendations
Expense Forecasting Historical spending, 6-month forecast per client

🛠 Tech Stack

Tool Purpose
DuckDB Data storage and querying
Pandas Data manipulation
Scikit-learn Clustering and preprocessing
XGBoost Fraud detection
TensorFlow/Keras Autoencoder
Prophet Expense forecasting
Statsmodels ARIMA forecasting
Streamlit Dashboard
Plotly Visualizations

📈 Key Findings

  • 13.3M transactions spanning 2010–2019
  • Fraud rate of 0.1% — heavily imbalanced dataset
  • 64.7% of customers are financially vulnerable
  • Online transactions have 3x higher average spend than swipe
  • Spending peaks in May and September annually
  • XGBoost detects 86% of fraud cases at 0.3 threshold

📜 License

This project is for educational and portfolio purposes only. Dataset credit: Caixabank Tech, 2024 AI Hackathon.

About

WealthLab AI is an end-to-end machine learning platform for financial intelligence, combining customer segmentation, fraud detection, risk profiling, and forecasting. It analyzes banking transactions, card data, and user behavior to generate insights, predictions, and personalized financial recommendations using ML and deep learning models.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors