💰 WEALTHLAB-AI

AI-Powered Financial Intelligence & Banking Analytics Platform

WEALTHLAB-AI is an end-to-end financial analytics platform built on real banking data from Caixabank Tech's 2024 AI Hackathon. It combines machine learning, time-series forecasting, and interactive visualizations to deliver fraud detection, customer segmentation, financial health scoring, and expense forecasting.

Dataset: Transactions Fraud Datasets — Kaggle

📁 Project Structure

WEALTHLAB-AI/
├── Data/
│   ├── Raw/                        ← Place original dataset files here
│   │   ├── transactions_data.csv
│   │   ├── cards_dat.csv
│   │   ├── users_data.csv
│   │   ├── train_fraud_labels.json
│   │   └── mcc_codes.json
│   └── Processed/                  ← Cleaned CSVs saved here
│       ├── transactions_cleaned.csv
│       ├── cards_cleaned.csv
│       ├── users_cleaned.csv
│       ├── fraud_labels_cleaned.csv
│       └── mcc_codes_cleaned.csv
├── Database/
│   └── wealthlab.duckdb            ← DuckDB persistent database
├── Models/
│   ├── fraud_xgb.pkl               ← Trained XGBoost fraud model
│   ├── fraud_scaler.pkl            ← Feature scaler
│   └── fraud_threshold.pkl         ← Optimal prediction threshold
├── notebooks/
│   ├── transactions_cleaning.ipynb
│   ├── cards_cleaning.ipynb
│   ├── users_cleaning.ipynb
│   ├── fraud_labels_cleaning.ipynb
│   ├── mcc_codes_cleaning.ipynb
│   ├── merging_and_integration.ipynb
│   ├── feature_engineering.ipynb
│   ├── customer_segmentation.ipynb
│   ├── fraud_detection.ipynb
│   ├── financial_health_scoring.ipynb
│   ├── recommendation_engine.ipynb
│   └── expense_forecasting.ipynb
├── app.py                          ← Streamlit dashboard
├── requirements.txt
└── README.md

🚀 Getting Started

Prerequisites

Python 3.11.9

1. Clone the repository

git clone https://github.com/yourusername/WEALTHLAB-AI.git
cd WEALTHLAB-AI

2. Create a virtual environment

python -m venv venv
venv\Scripts\activate        # Windows
source venv/bin/activate     # Mac/Linux

3. Install dependencies

pip install -r requirements.txt

4. Add the dataset

Download the dataset from Kaggle and place all files inside Data/Raw/.

5. Run the notebooks in order

Run each notebook inside notebooks/ in the following order:

transactions_cleaning.ipynb
cards_cleaning.ipynb
users_cleaning.ipynb
fraud_labels_cleaning.ipynb
mcc_codes_cleaning.ipynb
merging_and_integration.ipynb
feature_engineering.ipynb
customer_segmentation.ipynb
fraud_detection.ipynb
financial_health_scoring.ipynb
recommendation_engine.ipynb
expense_forecasting.ipynb

6. Launch the dashboard

streamlit run app.py

🧠 Modules

1. Data Engineering

Cleans and merges all 5 raw files into a single master table stored in DuckDB. Handles dollar sign formatting, null values, negative amounts, and online transaction anomalies.

2. Customer Segmentation

K-Means clustering on 1219 customers using behavioral features. Produces 4 segments: Low Debt Stable, High Debt Spenders, Digital Active Users, and High Risk.

3. Fraud Detection

Compares two models:

Autoencoder — unsupervised anomaly detection, ROC-AUC: 0.77
XGBoost — supervised classification, ROC-AUC: 0.85 ✅

Final model: XGBoost with threshold 0.3, achieving 86% fraud recall on 13M transactions.

4. Financial Health Scoring

Custom weighted formula combining savings ratio, credit score, debt-to-income ratio, and fraud history. Scores customers from 0–100 across four labels: Excellent, Stable, Moderate Risk, Financially Vulnerable.

5. Recommendation Engine

Rule-based system generating personalized financial advice based on health score, segment, debt ratio, credit utilization, and savings potential.

6. Expense Forecasting

Compares two forecasting models:

Prophet — MAPE: 13.37% ✅
ARIMA — MAPE: 18.62%

Final model: Prophet with yearly seasonality, forecasting 6 months ahead per customer.

📊 Dashboard Pages

Page	Description
Overview	Key metrics, spending trends, transaction distribution
Customer Analytics	Segments, credit scores, income vs debt, category spending
Fraud Intelligence	Fraud trends, live prediction, fraud by hour/category/type
Financial Health	Health scores, distributions, recommendations
Expense Forecasting	Historical spending, 6-month forecast per client

🛠 Tech Stack

Tool	Purpose
DuckDB	Data storage and querying
Pandas	Data manipulation
Scikit-learn	Clustering and preprocessing
XGBoost	Fraud detection
TensorFlow/Keras	Autoencoder
Prophet	Expense forecasting
Statsmodels	ARIMA forecasting
Streamlit	Dashboard
Plotly	Visualizations

📈 Key Findings

13.3M transactions spanning 2010–2019
Fraud rate of 0.1% — heavily imbalanced dataset
64.7% of customers are financially vulnerable
Online transactions have 3x higher average spend than swipe
Spending peaks in May and September annually
XGBoost detects 86% of fraud cases at 0.3 threshold

📜 License

This project is for educational and portfolio purposes only. Dataset credit: Caixabank Tech, 2024 AI Hackathon.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Models		Models
Notebooks		Notebooks
.gitignore		.gitignore
README.md		README.md
analytics_report.md		analytics_report.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

💰 WEALTHLAB-AI

AI-Powered Financial Intelligence & Banking Analytics Platform

📁 Project Structure

🚀 Getting Started

Prerequisites

1. Clone the repository

2. Create a virtual environment

3. Install dependencies

4. Add the dataset

5. Run the notebooks in order

6. Launch the dashboard

🧠 Modules

1. Data Engineering

2. Customer Segmentation

3. Fraud Detection

4. Financial Health Scoring

5. Recommendation Engine

6. Expense Forecasting

📊 Dashboard Pages

🛠 Tech Stack

📈 Key Findings

📜 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

💰 WEALTHLAB-AI

AI-Powered Financial Intelligence & Banking Analytics Platform

📁 Project Structure

🚀 Getting Started

Prerequisites

1. Clone the repository

2. Create a virtual environment

3. Install dependencies

4. Add the dataset

5. Run the notebooks in order

6. Launch the dashboard

🧠 Modules

1. Data Engineering

2. Customer Segmentation

3. Fraud Detection

4. Financial Health Scoring

5. Recommendation Engine

6. Expense Forecasting

📊 Dashboard Pages

🛠 Tech Stack

📈 Key Findings

📜 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages