Data Generation Using Modelling and Simulation for Machine Learning

This project demonstrates an end-to-end modelling, simulation, and machine learning pipeline for generating synthetic data using a valid simulation tool and performing model selection using TOPSIS.

The objective is to show how simulation-generated data can be effectively used for machine learning model evaluation and decision-making, strictly following the assignment guidelines.

Simulation Tool Used

Simulator: SimPy

SimPy is an open-source, process-based discrete-event simulation framework for Python.
It is officially listed on Wikipedia under:

List of Computer Simulation Software
https://en.wikipedia.org/wiki/List_of_computer_simulation_software

This satisfies the mandatory requirement that the simulator must be selected only from the provided Wikipedia list.

Problem Description

A queueing system is simulated to model a real-world service process such as:

Customer service desks
Bank counters
Call centers

The simulation captures how different system parameters affect average customer waiting time, which is later predicted using machine learning models.

Simulation Model

System Behavior

Customers arrive randomly into the system
A fixed number of servers provide service
If all servers are busy, customers wait in a queue
The simulation records the average waiting time

Simulation Parameters and Bounds

The following parameters were randomized for each simulation run:

Parameter	Description	Lower Bound	Upper Bound
arrival_rate	Mean customer arrival interval	1	5
service_rate	Mean service time	1	6
servers	Number of service counters	1	5
max_customers	Customers simulated	50	200

These bounds ensure realistic and stable simulation behavior.

Data Generation Methodology

Random values were sampled uniformly within the defined parameter bounds
Each parameter set was passed to the SimPy simulation
The simulation returned the average waiting time
This process was repeated 1000 times

Each simulation run produced one data point.

Generated Dataset

The dataset contains:

Simulation input parameters
Corresponding average waiting time

Saved at:

data/simpy_dataset.csv

Machine Learning Problem Formulation

Problem Type: Regression

Objective:
Predict the average waiting time based on simulation parameters.

Features

arrival_rate
service_rate
servers
max_customers

Target Variable

avg_wait_time

Machine Learning Models Evaluated

A total of 8 regression models were trained and evaluated:

Linear Regression
Ridge Regression
Lasso Regression
K-Nearest Neighbors (KNN) Regressor
Support Vector Regressor (SVR)
Decision Tree Regressor
Random Forest Regressor
Gradient Boosting Regressor

Evaluation Metrics

Each model was evaluated using the following metrics:

Metric	Type	Description
MSE	Cost	Mean Squared Error
MAE	Cost	Mean Absolute Error
R²	Benefit	Coefficient of Determination

Lower values are better for MSE and MAE
Higher values are better for R²

Evaluation results are saved in:

results/model_comparison.csv

TOPSIS-Based Model Selection

To objectively rank models considering multiple evaluation metrics, TOPSIS (Technique for Order Preference by Similarity to Ideal Solution) was applied.

TOPSIS Configuration

Criteria:
- MSE (Cost)
- MAE (Cost)
- R² (Benefit)
Weights:
- MSE: 0.33
- MAE: 0.33
- R²: 0.34

TOPSIS computes:

Distance from ideal best solution
Distance from ideal worst solution
A final TOPSIS score for ranking models

TOPSIS Results

Saved at:

results/topsis_ranking.csv

Result Visualizations

The following plots were generated and saved automatically:

Visualization	File
Arrival Rate vs Waiting Time	results/arrival_vs_wait.png
Model Comparison (R²)	results/model_r2_comparison.png
Model Comparison (MSE)	results/model_mse_comparison.png
TOPSIS Ranking	results/topsis_ranking.png

These plots are used for analysis and reporting.

Project Directory Structure

Data Generation using Modelling and Simulation for Machine Learning/ │ ├── data/ │ └── simpy_dataset.csv │ ├── notebooks/ │ └── full_pipeline.ipynb │ ├── results/ │ ├── arrival_vs_wait.png │ ├── model_comparison.csv │ ├── model_mse_comparison.png │ ├── model_r2_comparison.png │ ├── topsis_ranking.csv │ └── topsis_ranking.png │ ├── src/ │ ├── simulator.py │ ├── data_generation.py │ ├── ml_models.py │ ├── topsis.py │ └── utils.py │ ├── requirements.txt ├── README.md └── venv/

How to Run the Project

Step 1: Activate Virtual Environment

source venv/bin/activate

Step 2: Run the Notebook

Open and execute:

notebooks/full_pipeline.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Generation Using Modelling and Simulation for Machine Learning

Simulation Tool Used

Problem Description

Simulation Model

System Behavior

Simulation Parameters and Bounds

Data Generation Methodology

Generated Dataset

Machine Learning Problem Formulation

Features

Target Variable

Machine Learning Models Evaluated

Evaluation Metrics

TOPSIS-Based Model Selection

TOPSIS Configuration

TOPSIS Results

Result Visualizations

Project Directory Structure

How to Run the Project

Step 1: Activate Virtual Environment

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
notebooks		notebooks
results		results
src		src
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Data Generation Using Modelling and Simulation for Machine Learning

Simulation Tool Used

Problem Description

Simulation Model

System Behavior

Simulation Parameters and Bounds

Data Generation Methodology

Generated Dataset

Machine Learning Problem Formulation

Features

Target Variable

Machine Learning Models Evaluated

Evaluation Metrics

TOPSIS-Based Model Selection

TOPSIS Configuration

TOPSIS Results

Result Visualizations

Project Directory Structure

How to Run the Project

Step 1: Activate Virtual Environment

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages