Skip to content
View kabNath's full-sized avatar
  • Taipei, Taiwan
  • 23:39 (UTC +08:00)

Block or report kabNath

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
KabNath/README.md

banner

Multi-agent AI & GPU systems for wireless β€” and the same toolkit applied to markets

PhD candidate at National Taipei University of Technology working on multi-agent deep reinforcement learning for UAV-assisted networks, reconfigurable intelligent surfaces (RIS), and space-air-ground integrated networks (SAGIN) β€” with a hands-on focus on GPU-accelerated PHY (CUDA, cuSolver/cuBLAS, NVIDIA Sionna). I also apply the same RL/ML toolkit to systematic trading: regime detection, risk control, and production ML. πŸ† 4.00 / 4.00 GPA


πŸ›  Tech stack

Python PyTorch TensorFlow CUDA C++ NumPy Pandas Sionna QuantConnect Linux

Specialty domains: OFDM PHY Β· MIMO Β· LDPC Β· 5G NR Β· RIS Β· O-RAN Β· Federated Learning Β· MADDPG Β· PPO Β· CUDA kernels Β· Systematic trading


πŸ›° AI-native wireless & 6G

End-to-end OFDM PHY + AI-RAN stack where every claim is runnable. Real CUDA C++ MMSE channel estimation (custom kernel + cuSolver Cpotrf/Cpotrs + cuBLAS Cgemm), a Sionna 5G LDPC + TR38.901 TDL BLER link, and link adaptation (OLLA / model-based greedy / model-free learned policy) driven by the measured BLER curves. Result: 133Γ— vs NumPy on an RTX 4090 (verified to ~1e-4 vs a CPU reference) Β· model-free policy matches a tuned OLLA / greedy from ACK/NACK feedback alone, at the lowest BLER Β· figures, green CI, and a NVIDIA_REVIEW.md design/verification guide

A CUDA kernel-optimization case study on the MMSE-apply complex GEMM: naive → shared-memory tiled → cuBLAS, benchmarked on an RTX 4090, with an Nsight Compute profiling methodology. Keeps a NumPy/CuPy reference and checks correctness against a CPU baseline. Result: clean naive→tiled→cuBLAS performance breakdown · GPU == CPU to ~1e-4 · CUDA compile-checked in CI · the GPU-kernel companion to the flagship

Standalone deep-RL study for 5G NR link adaptation: self-contained PyTorch PPO vs an OLLA industry baseline, 28-index MCS table (3GPP TS 38.214), non-stationary SNR with mobility/handover scenarios. Result: PPO learns a competitive policy from scratch (~3 min CPU training), evaluated fairly head-to-head vs OLLA Β· 15/15 unit tests

Federated learning for CSI feedback compression: CsiNet autoencoder + FedAvg under non-IID channel statistics, aligned with the 3GPP Release 18 AI-RAN study item. Result: FedAvg matches centralised (~βˆ’2 dB NMSE) and beats local-only by ~2 dB Β· 16/16 unit tests

🚧 In active development

  • ris-beamforming-optimizer β€” RIS phase optimization (manifold + deep-learning methods)
  • oran-resource-allocation-xapp β€” O-RAN xApp-style resource scheduling with DRL

πŸ“ˆ Systematic trading & quantitative ML

πŸ§ͺ frix-project β€” friction-realistic execution benchmark

An honest, reproducible benchmark for intraday order execution on real crypto LOB β€” classical (TWAP/VWAP/POV/Almgren–Chriss) vs deep RL (PPO/SAC) vs an LLM meta-controller, under one friction model (fees, queue, partial fills, adverse selection). Result (the honest null): no ML method beats a simple classical schedule on BTC or ETH β€” on ETH, deep RL is significantly worse (PPO, p = 0.035). The benchmark caught four simulator artifacts that had faked a ~2 bps edge; a training-length ablation rules out undertraining Β· paired-bootstrap significance Β· 13 unit tests Β· CI

Regime detection, risk allocation and live health monitoring β€” the open-sourced production layer of a systematic book. Six-indicator regime classifier (BULL/NEUTRAL/BEAR/CRISIS) with hard crisis gates, inverse-volatility weighting with portfolio vol targeting, and a monitoring battery with a trailing-drawdown kill-switch. Result: explainable-by-construction regime calls Β· fail-safe tradeable flag for automated halts Β· 9/9 unit tests Β· CI

πŸ’Ό AI-Capital (public sample β€” full system private)

Multi-asset systematic trading system on QuantConnect: cross-asset momentum, regime detection, volatility targeting and crisis routing, iterated through 11+ versions under strict out-of-sample validation with bias auditing (look-ahead, weight-cap, vol-estimation pitfalls). Status: active paper-trading track record Β· public repo contains a representative architecture sample

πŸ”¬ Alpha research β€” WorldQuant BRAIN

Systematic factor research on a professional simulation platform; first passing alpha cleared the platform's evaluation thresholds.


πŸ“Š Research output

8+ IEEE publications in AI-native wireless networks:

  • Hybrid federated learning with MADDPG for UAV-assisted access networks
  • Reconfigurable intelligent surface optimization for 6G
  • SAGIN architectures with LEO (Starlink) integration
  • Channel estimation and beamforming for next-gen PHY

πŸ”— ORCID


πŸ› Affiliations

  • πŸŽ“ PhD candidate, National Taipei University of Technology β€” 4.00 / 4.00 GPA
  • 🟒 NVIDIA NGC 6G Developer Program β€” Member, 2026 cohort
  • πŸ‘¨β€πŸ« Advisors: Prof. Hsin-Piao Lin Β· Assoc. Prof. Rong-Terng Juang

🌐 Connect

πŸ“§ Email Β· πŸ’Ό LinkedIn Β· πŸŽ“ ORCID πŸ“ Taipei, Taiwan Β· πŸ‡«πŸ‡· πŸ‡¬πŸ‡§ πŸ‡ΉπŸ‡Ό


πŸ“Š GitHub stats

Nathan's GitHub stats Top Languages

Pinned Loading

  1. spec-conformance-agentic spec-conformance-agentic Public

    3GPP/O-RAN document-level conformance agent: vectorless clause-graph navigation (Neo4j) + LangGraph pipeline, with cited, signed verdicts.

    TypeScript

  2. gpu-accelerated-ai-ran-phy-lab gpu-accelerated-ai-ran-phy-lab Public

    Python

  3. cuda-phy-channel-estimation cuda-phy-channel-estimation Public

    CUDA kernel-optimization case study β€” naive β†’ tiled β†’ cuBLAS MMSE-apply, benchmarked on RTX 4090, Nsight methodology

    Jupyter Notebook

  4. Market-regime-engine Market-regime-engine Public

    Regime detection, risk allocation and live monitoring for systematic trading

    Python

  5. federated-csi-feedback federated-csi-feedback Public

    Federated learning for CSI feedback compression - CsiNet + FedAvg under non-IID channel statistics

    Python

  6. frix-project frix-project Public

    Reproducible benchmark for intraday order execution on real crypto LOB. Classical vs deep RL vs LLM. Result: no method beats a simple classical schedule (BTC + ETH).

    Python