Skip to content
View ideepkush's full-sized avatar
:electron:
MSc Data Science @ Federico II | Open to work
:electron:
MSc Data Science @ Federico II | Open to work

Block or report ideepkush

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
ideepkush/README.md

Deepak Kushwaha

Data Scientist & ML Engineer | Deploying ML Systems to Production at Scale

2+ yrs production experience (TCS) · Ex JP Morgan · Ex IIT Indore · 🥇 Google Challenge 2026 Winner


⚡ The Short Version

I don't just build models — I ship them. I've spent 2+ years at Tata Consultancy Services building production data systems across the technology and financial sectors. I containerize, deploy, monitor, and iterate. If it doesn't run in production, I'm not done.

Currently completing my MSc in Data Science at the University of Naples Federico II, where I led a team to 1st Place (STEM) at Google Challenge Campania 2026 — building a generative AI solution that beat out every competing team.


🧠 What I Bring to the Table

  • Full-lifecycle ML: data engineering → model development → containerized deployment → monitoring
  • Production systems: FastAPI + Docker + AWS — not just notebooks
  • Big data at scale: PySpark, Kafka, Spark NLP on high-volume streaming data
  • Real business impact: 2+ years solving problems in technology and financial services at TCS & JP Morgan

🛠️ Core Tech Stack

Domain Technologies
ML & Deep Learning TensorFlow Keras Scikit--learn Pandas
Big Data & Streaming Apache Spark Kafka Hadoop
MLOps & Cloud Docker AWS MLflow FastAPI
Languages & Data Python SQL MongoDB PostgreSQL Git

📌 Featured Projects — Proof, Not Promises

Every project below is deployed, documented, or has a live demo. Click through.

What: Production REST API that classifies network traffic as phishing or legitimate — in real time and in batch.

Impact: Deployed and running on AWS EC2 with zero-downtime containerized architecture. Serves predictions via FastAPI endpoints with full experiment tracking.

How it works: Raw network data → feature extraction → trained ML model → prediction served via REST endpoint → results logged in MongoDB → experiments versioned in MLflow.

FastAPI Docker AWS EC2 MLflow MongoDB Scikit-learn


What: Self-hosted AI agent that discovers job postings, scores them against your profile, and tailors applications — automatically.

Impact: Runs 24/7 in Docker for ~$1.80/month. Replaces hours of manual job hunting with intelligent, automated matching.

How it works: n8n orchestration → job discovery → Claude API scores & matches → PostgreSQL stores state → tailored applications generated automatically.

n8n Claude API PostgreSQL Docker Shell


What: Deep learning model that predicts which customers will leave — before they do.

Impact: ANN with dropout regularization for production-grade binary classification. Live Streamlit app lets anyone input customer data and get instant predictions.

TensorFlow Keras Streamlit Deep Learning


What: Hybrid recommender combining content-based filtering and collaborative filtering.

Impact: Deployed interactive app that returns instant movie recommendations using cosine similarity on movie metadata.

Scikit-learn NLP Streamlit Python


What: Distributed pipeline for real-time sentiment and fault classification on high-volume data streams.

Impact: Handles streaming data at scale using PySpark + Kafka, with NLP classification applied in real time — not batch.

PySpark Kafka Spark NLP Big Data


What: Classification pipeline predicting whether a customer will subscribe to a term deposit.

Impact: Full ML pipeline with PCA dimensionality reduction and GridSearchCV hyperparameter tuning. Evaluated on ROC-AUC for real-world class imbalance.

Scikit-learn ROC-AUC GridSearchCV Random Forest


📈 Career Timeline

Role Organisation Period
🏆 Google Challenge 2026 — 1st Place STEM Generative AI (Gemini + NotebookLM Pro) 2026
🎓 MSc Data Science University of Naples Federico II Dec 2025 – Present
💼 Data Analyst Tata Consultancy Services Aug 2023 – Nov 2025
💼 Intern JP Morgan Chase Jul – Dec 2022
🔬 Research Intern IIT Indore Oct – Nov 2022
🎓 BTech Civil Eng. Jamia Millia Islamia (1st Div. Honours) 2019 – 2023

📊 GitHub Activity


Open to opportunities in Data Science, ML Engineering & MLOps
LinkedIn · Portfolio · deepakkushwaha771@gmail.com

Popular repositories Loading

  1. jenkins-practice jenkins-practice Public

    CI/CD pipeline experiments with Jenkins — build automation, pipeline-as-code, and integration workflows.

  2. Data-Science-Projects Data-Science-Projects Public

    Production-grade data science projects: Phishing Detection API (FastAPI + Docker + AWS), Spark+Kafka streaming pipeline, Bank Marketing classifier, A/B testing, NLP models, and more.

    Jupyter Notebook

  3. mlproject mlproject Public

    End-to-end ML project with modular pipeline architecture — data ingestion, transformation, model training, and prediction. Python + Scikit-learn.

    Python

  4. movie-rec-main movie-rec-main Public

    Content-based + collaborative filtering movie recommender using cosine similarity. Deployed as an interactive Streamlit app.

    Python

  5. deep-learning-churn-prediction deep-learning-churn-prediction Public

    Customer churn prediction using ANN with dropout regularization. Deployed live on Streamlit. Built with TensorFlow + Keras.

    Jupyter Notebook

  6. Job-Application-Automation-System Job-Application-Automation-System Public

    Self-hosted AI pipeline that discovers, scores & tailors job applications automatically. n8n + Claude API + PostgreSQL + Docker. ~$1.80/month.

    Shell