Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ Practice implementing operators and architectures from scratch — the exact ski
[![GitHub stars](https://img.shields.io/github/stars/duoan/TorchCode?style=social)](https://github.com/duoan/TorchCode)
[![GitHub Container Registry](https://img.shields.io/badge/ghcr.io-TorchCode-blue?style=flat-square&logo=github)](https://ghcr.io/duoan/torchcode)
[![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Spaces-TorchCode-blue?style=flat-square)](https://huggingface.co/spaces/duoan/TorchCode)
![Problems](https://img.shields.io/badge/problems-40-orange?style=flat-square)
![Problems](https://img.shields.io/badge/problems-41-orange?style=flat-square)
![GPU](https://img.shields.io/badge/GPU-not%20required-brightgreen?style=flat-square)

[![Star History Chart](https://api.star-history.com/svg?repos=duoan/TorchCode&type=Date)](https://star-history.com/#duoan/TorchCode&Date)
Expand All @@ -44,7 +44,7 @@ TorchCode gives you a **structured practice environment** with:

| | Feature | |
|---|---|---|
| 🧩 | **40 curated problems** | The most frequently asked PyTorch interview topics |
| 🧩 | **41 curated problems** | The most frequently asked PyTorch interview topics |
| ⚖️ | **Automated judge** | Correctness checks, gradient verification, and timing |
| 🎨 | **Instant feedback** | Colored pass/fail per test case, just like competitive programming |
| 💡 | **Hints when stuck** | Nudges without full spoilers |
Expand Down Expand Up @@ -198,6 +198,7 @@ If you're interviewing for any role touching LLMs or Transformers, expect at lea
| 37 | <a href="https://github.com/duoan/TorchCode/blob/master/templates/37_dpo_loss.ipynb" target="_blank">DPO Loss</a> <a href="https://colab.research.google.com/github/duoan/TorchCode/blob/master/templates/37_dpo_loss.ipynb" target="_blank"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab" height="20"></a> | `dpo_loss(chosen, rejected, ...)` | ![Hard](https://img.shields.io/badge/Hard-F44336?style=flat-square) | 💡 | Direct preference optimization, alignment training |
| 38 | <a href="https://github.com/duoan/TorchCode/blob/master/templates/38_grpo_loss.ipynb" target="_blank">GRPO Loss</a> <a href="https://colab.research.google.com/github/duoan/TorchCode/blob/master/templates/38_grpo_loss.ipynb" target="_blank"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab" height="20"></a> | `grpo_loss(logps, rewards, group_ids, eps)` | ![Hard](https://img.shields.io/badge/Hard-F44336?style=flat-square) | 💡 | Group relative policy optimization, RLAIF, within-group normalized advantages |
| 39 | <a href="https://github.com/duoan/TorchCode/blob/master/templates/39_ppo_loss.ipynb" target="_blank">PPO Loss</a> <a href="https://colab.research.google.com/github/duoan/TorchCode/blob/master/templates/39_ppo_loss.ipynb" target="_blank"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab" height="20"></a> | `ppo_loss(new_logps, old_logps, advantages, clip_ratio)` | ![Hard](https://img.shields.io/badge/Hard-F44336?style=flat-square) | 💡 | PPO clipped surrogate loss, policy gradient, trust region |
| 41 | <a href="https://github.com/duoan/TorchCode/blob/master/templates/41_opd_loss.ipynb" target="_blank">OPD Loss</a> <a href="https://colab.research.google.com/github/duoan/TorchCode/blob/master/templates/41_opd_loss.ipynb" target="_blank"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab" height="20"></a> | `opd_loss(student_logits, teacher_logits, ...)` | ![Hard](https://img.shields.io/badge/Hard-F44336?style=flat-square) | 💡 | On-policy distillation, reverse KL, multi-teacher alignment |

---

Expand Down Expand Up @@ -244,7 +245,7 @@ status() # Progress dashboard — solved / attempted / todo
| **1** | 🧱 Foundations | ReLU → Softmax → CE Loss → Dropout → Embedding → GELU → Linear → LayerNorm → BatchNorm → RMSNorm → SwiGLU MLP → Conv2d | 2–3 hrs |
| **2** | 🧠 Attention Deep Dive | SDPA → MHA → Cross-Attn → Causal → GQA → KV Cache → Sliding Window → RoPE → Linear Attn → Flash Attn | 3–4 hrs |
| **3** | 🏗️ Architecture + Training | GPT-2 Block → LoRA → MoE → ViT Patch → Adam → Cosine LR → Grad Clip → Grad Accumulation → Kaiming Init | 3–4 hrs |
| **4** | 🎯 Inference + Advanced | Top-k/p Sampling → Beam Search → Speculative Decoding → BPE → INT8 Quant → DPO Loss → GRPO Loss → PPO Loss + speed run | 3–4 hrs |
| **4** | 🎯 Inference + Advanced | Top-k/p Sampling → Beam Search → Speculative Decoding → BPE → INT8 Quant → DPO Loss → GRPO Loss → PPO Loss → OPD Loss + speed run | 3–4 hrs |

---

Expand Down
137 changes: 137 additions & 0 deletions solutions/41_opd_loss_solution.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/duoan/TorchCode/blob/master/solutions/41_opd_loss_solution.ipynb)\n",
"\n",
"# Solution: OPD (On-Policy Distillation) Loss\n",
"\n",
"Reference solution."
]
},
{
"cell_type": "code",
"metadata": {},
"source": [
"# Install torch-judge in Colab (no-op in JupyterLab/Docker)\n",
"try:\n",
" import google.colab\n",
" get_ipython().run_line_magic('pip', 'install -q torch-judge')\n",
"except ImportError:\n",
" pass\n"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import torch\n",
"import torch.nn.functional as F\n",
"from torch import Tensor"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# ✅ SOLUTION\n",
"\n",
"def opd_loss(student_logits: Tensor,\n",
" teacher_logits: Tensor,\n",
" teacher_weights: Tensor | None = None,\n",
" mask: Tensor | None = None,\n",
" temperature: float = 1.0) -> Tensor:\n",
" \"\"\"On-Policy Distillation (OPD) reverse-KL loss.\n",
"\n",
" student_logits: (..., V) logits from the student policy\n",
" teacher_logits: (..., V) for one teacher, or (T, ..., V) for T teachers\n",
" teacher_weights: optional (T,) weights, normalized internally\n",
" mask: optional (...) token mask, where 1 = include and 0 = ignore\n",
" temperature: softmax temperature used for distillation\n",
" returns: scalar loss (Tensor)\n",
" \"\"\"\n",
" if teacher_logits.dim() == student_logits.dim():\n",
" teacher_logits = teacher_logits.unsqueeze(0)\n",
" elif teacher_logits.dim() != student_logits.dim() + 1:\n",
" raise ValueError('teacher_logits must have shape (..., V) or (T, ..., V)')\n",
"\n",
" teacher_logits = teacher_logits.detach()\n",
" t = float(temperature)\n",
"\n",
" student_logp = F.log_softmax(student_logits / t, dim=-1)\n",
" student_prob = student_logp.exp()\n",
" teacher_logp = F.log_softmax(teacher_logits / t, dim=-1)\n",
"\n",
" per_teacher_kl = (\n",
" student_prob.unsqueeze(0) * (student_logp.unsqueeze(0) - teacher_logp)\n",
" ).sum(dim=-1)\n",
"\n",
" num_teachers = per_teacher_kl.shape[0]\n",
" if teacher_weights is None:\n",
" weights = torch.full(\n",
" (num_teachers,),\n",
" 1.0 / num_teachers,\n",
" dtype=per_teacher_kl.dtype,\n",
" device=per_teacher_kl.device,\n",
" )\n",
" else:\n",
" weights = teacher_weights.to(dtype=per_teacher_kl.dtype, device=per_teacher_kl.device)\n",
" weights = weights / weights.sum()\n",
"\n",
" view_shape = (num_teachers,) + (1,) * (per_teacher_kl.dim() - 1)\n",
" per_token_kl = (weights.view(view_shape) * per_teacher_kl).sum(dim=0)\n",
"\n",
" if mask is not None:\n",
" mask = mask.to(dtype=per_token_kl.dtype, device=per_token_kl.device)\n",
" loss = (per_token_kl * mask).sum() / mask.sum().clamp_min(1.0)\n",
" else:\n",
" loss = per_token_kl.mean()\n",
"\n",
" return loss * (t ** 2)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Demo\n",
"student_logits = torch.tensor([[[2.0, 0.0, -1.0], [0.5, 1.0, -0.5]]])\n",
"teacher_logits = torch.tensor([[[1.0, 1.5, -0.5], [0.0, 2.0, -1.0]]])\n",
"print('Loss:', opd_loss(student_logits, teacher_logits).item())"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from torch_judge import check\n",
"check('opd_loss')"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.11.13"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
3 changes: 2 additions & 1 deletion templates/00_welcome.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Problem List (40 problems)\n",
"## Problem List (41 problems)\n",
"\n",
"### 🧱 Fundamentals — \"Implement X from scratch\"\n",
"\n",
Expand Down Expand Up @@ -135,6 +135,7 @@
"| 37 | DPO Loss | 🔴 Hard | [Open](37_dpo_loss.ipynb) · <a href=\"https://github.com/duoan/TorchCode/blob/master/templates/37_dpo_loss.ipynb\" target=\"_blank\">GitHub</a> · <a href=\"https://colab.research.google.com/github/duoan/TorchCode/blob/master/templates/37_dpo_loss.ipynb\" target=\"_blank\">Colab</a> | [Open](37_dpo_loss_solution.ipynb) · <a href=\"https://github.com/duoan/TorchCode/blob/master/solutions/37_dpo_loss_solution.ipynb\" target=\"_blank\">GitHub</a> · <a href=\"https://colab.research.google.com/github/duoan/TorchCode/blob/master/solutions/37_dpo_loss_solution.ipynb\" target=\"_blank\">Colab</a> |\n",
"| 38 | GRPO Loss | 🔴 Hard | [Open](38_grpo_loss.ipynb) · <a href=\"https://github.com/duoan/TorchCode/blob/master/templates/38_grpo_loss.ipynb\" target=\"_blank\">GitHub</a> · <a href=\"https://colab.research.google.com/github/duoan/TorchCode/blob/master/templates/38_grpo_loss.ipynb\" target=\"_blank\">Colab</a> | [Open](38_grpo_loss_solution.ipynb) · <a href=\"https://github.com/duoan/TorchCode/blob/master/solutions/38_grpo_loss_solution.ipynb\" target=\"_blank\">GitHub</a> · <a href=\"https://colab.research.google.com/github/duoan/TorchCode/blob/master/solutions/38_grpo_loss_solution.ipynb\" target=\"_blank\">Colab</a> |\n",
"| 39 | PPO Loss | 🔴 Hard | [Open](39_ppo_loss.ipynb) · <a href=\"https://github.com/duoan/TorchCode/blob/master/templates/39_ppo_loss.ipynb\" target=\"_blank\">GitHub</a> · <a href=\"https://colab.research.google.com/github/duoan/TorchCode/blob/master/templates/39_ppo_loss.ipynb\" target=\"_blank\">Colab</a> | [Open](39_ppo_loss_solution.ipynb) · <a href=\"https://github.com/duoan/TorchCode/blob/master/solutions/39_ppo_loss_solution.ipynb\" target=\"_blank\">GitHub</a> · <a href=\"https://colab.research.google.com/github/duoan/TorchCode/blob/master/solutions/39_ppo_loss_solution.ipynb\" target=\"_blank\">Colab</a> |\n",
"| 41 | OPD Loss | 🔴 Hard | [Open](41_opd_loss.ipynb) · <a href=\"https://github.com/duoan/TorchCode/blob/master/templates/41_opd_loss.ipynb\" target=\"_blank\">GitHub</a> · <a href=\"https://colab.research.google.com/github/duoan/TorchCode/blob/master/templates/41_opd_loss.ipynb\" target=\"_blank\">Colab</a> | [Open](41_opd_loss_solution.ipynb) · <a href=\"https://github.com/duoan/TorchCode/blob/master/solutions/41_opd_loss_solution.ipynb\" target=\"_blank\">GitHub</a> · <a href=\"https://colab.research.google.com/github/duoan/TorchCode/blob/master/solutions/41_opd_loss_solution.ipynb\" target=\"_blank\">Colab</a> |\n",
"\n",
"## Useful Commands\n",
"\n",
Expand Down
120 changes: 120 additions & 0 deletions templates/41_opd_loss.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/duoan/TorchCode/blob/master/templates/41_opd_loss.ipynb)\n",
"\n",
"# 🔴 Hard: OPD Loss\n",
"\n",
"Implement the **On-Policy Distillation (OPD)** loss used to distill one or more teacher policies into a student policy on trajectories sampled from the student.\n",
"\n",
"For each token position, OPD minimizes a weighted reverse KL from the student distribution to each teacher distribution:\n",
"\n",
"$$\\mathcal{L}_{\\text{OPD}} = \\sum_j w_j\\,D_{\\text{KL}}\\big(\\pi_\\theta(\\cdot \\mid x)\\;||\\;\\pi_{E_j}(\\cdot \\mid x)\\big)$$\n",
"\n",
"where\n",
"\n",
"$$D_{\\text{KL}}(p||q) = \\sum_v p(v)\\,[\\log p(v) - \\log q(v)].$$\n",
"\n",
"### Signature\n",
"```python\n",
"from torch import Tensor\n",
"\n",
"def opd_loss(student_logits: Tensor,\n",
" teacher_logits: Tensor,\n",
" teacher_weights: Tensor | None = None,\n",
" mask: Tensor | None = None,\n",
" temperature: float = 1.0) -> Tensor:\n",
" \"\"\"OPD reverse-KL distillation loss.\n",
"\n",
" student_logits: (..., V) logits from the student policy\n",
" teacher_logits: (..., V) for one teacher, or (T, ..., V) for T teachers\n",
" teacher_weights: optional (T,) weights, normalized internally\n",
" mask: optional (...) token mask, where 1 = include and 0 = ignore\n",
" temperature: softmax temperature used for distillation\n",
" returns: scalar loss (Tensor)\n",
" \"\"\"\n",
"```"
]
},
{
"cell_type": "code",
"metadata": {},
"source": [
"# Install torch-judge in Colab (no-op in JupyterLab/Docker)\n",
"try:\n",
" import google.colab\n",
" get_ipython().run_line_magic('pip', 'install -q torch-judge')\n",
"except ImportError:\n",
" pass\n"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import torch\n",
"import torch.nn.functional as F\n",
"from torch import Tensor"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# ✏️ YOUR IMPLEMENTATION HERE\n",
"\n",
"def opd_loss(student_logits: Tensor,\n",
" teacher_logits: Tensor,\n",
" teacher_weights: Tensor | None = None,\n",
" mask: Tensor | None = None,\n",
" temperature: float = 1.0) -> Tensor:\n",
" pass # reverse KL: sum_v p_student * (log p_student - log p_teacher)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# 🧪 Debug\n",
"student_logits = torch.tensor([[[2.0, 0.0, -1.0], [0.5, 1.0, -0.5]]])\n",
"teacher_logits = torch.tensor([[[1.0, 1.5, -0.5], [0.0, 2.0, -1.0]]])\n",
"print('Loss:', opd_loss(student_logits, teacher_logits).item())"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# ✅ SUBMIT\n",
"from torch_judge import check\n",
"check('opd_loss')"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.11.13"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Loading