WEALTHLAB-AI is an end-to-end financial analytics platform built on real banking data from Caixabank Tech's 2024 AI Hackathon. It combines machine learning, time-series forecasting, and interactive visualizations to deliver fraud detection, customer segmentation, financial health scoring, and expense forecasting.
Dataset: Transactions Fraud Datasets — Kaggle
WEALTHLAB-AI/
├── Data/
│ ├── Raw/ ← Place original dataset files here
│ │ ├── transactions_data.csv
│ │ ├── cards_dat.csv
│ │ ├── users_data.csv
│ │ ├── train_fraud_labels.json
│ │ └── mcc_codes.json
│ └── Processed/ ← Cleaned CSVs saved here
│ ├── transactions_cleaned.csv
│ ├── cards_cleaned.csv
│ ├── users_cleaned.csv
│ ├── fraud_labels_cleaned.csv
│ └── mcc_codes_cleaned.csv
├── Database/
│ └── wealthlab.duckdb ← DuckDB persistent database
├── Models/
│ ├── fraud_xgb.pkl ← Trained XGBoost fraud model
│ ├── fraud_scaler.pkl ← Feature scaler
│ └── fraud_threshold.pkl ← Optimal prediction threshold
├── notebooks/
│ ├── transactions_cleaning.ipynb
│ ├── cards_cleaning.ipynb
│ ├── users_cleaning.ipynb
│ ├── fraud_labels_cleaning.ipynb
│ ├── mcc_codes_cleaning.ipynb
│ ├── merging_and_integration.ipynb
│ ├── feature_engineering.ipynb
│ ├── customer_segmentation.ipynb
│ ├── fraud_detection.ipynb
│ ├── financial_health_scoring.ipynb
│ ├── recommendation_engine.ipynb
│ └── expense_forecasting.ipynb
├── app.py ← Streamlit dashboard
├── requirements.txt
└── README.md
- Python 3.11.9
git clone https://github.com/yourusername/WEALTHLAB-AI.git
cd WEALTHLAB-AIpython -m venv venv
venv\Scripts\activate # Windows
source venv/bin/activate # Mac/Linuxpip install -r requirements.txtDownload the dataset from Kaggle and place all files inside Data/Raw/.
Run each notebook inside notebooks/ in the following order:
transactions_cleaning.ipynbcards_cleaning.ipynbusers_cleaning.ipynbfraud_labels_cleaning.ipynbmcc_codes_cleaning.ipynbmerging_and_integration.ipynbfeature_engineering.ipynbcustomer_segmentation.ipynbfraud_detection.ipynbfinancial_health_scoring.ipynbrecommendation_engine.ipynbexpense_forecasting.ipynb
streamlit run app.pyCleans and merges all 5 raw files into a single master table stored in DuckDB. Handles dollar sign formatting, null values, negative amounts, and online transaction anomalies.
K-Means clustering on 1219 customers using behavioral features. Produces 4 segments: Low Debt Stable, High Debt Spenders, Digital Active Users, and High Risk.
Compares two models:
- Autoencoder — unsupervised anomaly detection, ROC-AUC: 0.77
- XGBoost — supervised classification, ROC-AUC: 0.85 ✅
Final model: XGBoost with threshold 0.3, achieving 86% fraud recall on 13M transactions.
Custom weighted formula combining savings ratio, credit score, debt-to-income ratio, and fraud history. Scores customers from 0–100 across four labels: Excellent, Stable, Moderate Risk, Financially Vulnerable.
Rule-based system generating personalized financial advice based on health score, segment, debt ratio, credit utilization, and savings potential.
Compares two forecasting models:
- Prophet — MAPE: 13.37% ✅
- ARIMA — MAPE: 18.62%
Final model: Prophet with yearly seasonality, forecasting 6 months ahead per customer.
| Page | Description |
|---|---|
| Overview | Key metrics, spending trends, transaction distribution |
| Customer Analytics | Segments, credit scores, income vs debt, category spending |
| Fraud Intelligence | Fraud trends, live prediction, fraud by hour/category/type |
| Financial Health | Health scores, distributions, recommendations |
| Expense Forecasting | Historical spending, 6-month forecast per client |
| Tool | Purpose |
|---|---|
| DuckDB | Data storage and querying |
| Pandas | Data manipulation |
| Scikit-learn | Clustering and preprocessing |
| XGBoost | Fraud detection |
| TensorFlow/Keras | Autoencoder |
| Prophet | Expense forecasting |
| Statsmodels | ARIMA forecasting |
| Streamlit | Dashboard |
| Plotly | Visualizations |
- 13.3M transactions spanning 2010–2019
- Fraud rate of 0.1% — heavily imbalanced dataset
- 64.7% of customers are financially vulnerable
- Online transactions have 3x higher average spend than swipe
- Spending peaks in May and September annually
- XGBoost detects 86% of fraud cases at 0.3 threshold
This project is for educational and portfolio purposes only. Dataset credit: Caixabank Tech, 2024 AI Hackathon.