diff --git a/.github/ISSUE_TEMPLATE/bug_report.yml b/.github/ISSUE_TEMPLATE/bug_report.yml
new file mode 100644
index 00000000..a8a7daec
--- /dev/null
+++ b/.github/ISSUE_TEMPLATE/bug_report.yml
@@ -0,0 +1,66 @@
+name: 🐛 Bug report
+description: Report a bug in Project Manager for Java, with a reproducible project so it can be confirmed automatically.
+labels: ["bug"]
+body:
+  - type: markdown
+    attributes:
+      value: |
+        Thanks for filing a bug! A **minimal reproducible project** lets a maintainer (or the Copilot agent) reproduce the issue with an AutoTest UI test and turn it into a regression test. The more precise the repro, the faster the fix.
+
+  - type: textarea
+    id: description
+    attributes:
+      label: Describe the bug
+      description: A clear and concise description of what the bug is.
+    validations:
+      required: true
+
+  - type: input
+    id: repro-project
+    attributes:
+      label: Reproducible project
+      description: A link to a public GitHub repo, or note that you attached a zip below. Prefer the smallest project that still shows the bug.
+      placeholder: "https://github.com/<you>/<minimal-repro>"
+    validations:
+      required: true
+
+  - type: textarea
+    id: steps
+    attributes:
+      label: Steps to reproduce
+      description: Exact steps against the project above. Name the affected surface (Java Projects tree, a context-menu / command, Referenced Libraries, export jar, new type, etc.).
+      placeholder: |
+        1. Open the project above in VS Code
+        2. Focus the Java Projects view
+        3. Expand src/main/java > com.example
+        4. Right-click App.java > ...
+    validations:
+      required: true
+
+  - type: textarea
+    id: expected
+    attributes:
+      label: Expected behavior
+    validations:
+      required: true
+
+  - type: textarea
+    id: actual
+    attributes:
+      label: Actual behavior
+      description: What happened instead. Screenshots and the "Java" / "Language Support for Java" output-channel logs help a lot.
+    validations:
+      required: true
+
+  - type: textarea
+    id: versions
+    attributes:
+      label: Environment
+      description: Fill in the versions you are running.
+      value: |
+        - OS:
+        - VS Code version:
+        - Extension Pack for Java / Project Manager for Java version:
+        - JDK version:
+    validations:
+      required: true
diff --git a/.github/ISSUE_TEMPLATE/config.yml b/.github/ISSUE_TEMPLATE/config.yml
new file mode 100644
index 00000000..271dd95f
--- /dev/null
+++ b/.github/ISSUE_TEMPLATE/config.yml
@@ -0,0 +1,5 @@
+blank_issues_enabled: true
+contact_links:
+  - name: 💬 Questions & discussions
+    url: https://github.com/microsoft/vscode-java-dependency/discussions
+    about: Ask usage questions or start a discussion instead of filing a bug.
diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md
index 451981cf..a52fbd5a 100644
--- a/.github/copilot-instructions.md
+++ b/.github/copilot-instructions.md
@@ -1,8 +1,19 @@
 # Copilot instructions for vscode-java-dependency
 
+## Bug reproduction
+
+- **Classify the task first — the repro / UI-test flow is opt-in, not automatic.** Use the `repro` skill **only** when the task is to fix or confirm a **reproducible bug** (an issue that carries repro steps + a project, or you are explicitly asked to reproduce/confirm a report). For everything else — new features, refactors, performance work, dependency/version bumps, docs, config, CI, or code cleanup — make a normal PR with the appropriate unit/integration tests and **do not** author a `test/e2e-plans/repro-issue-*.yaml`. No repro plan file means the CI red→green gate never triggers; nothing extra runs.
+- **What always runs vs what is opt-in:** every PR to `main` still gets lint + the existing `java-dep-*` regression E2E (unchanged safety net). The red→green **gate is additional and fires only when the PR contains a `repro-issue-<n>.yaml`.** So the decision to enter this flow is made purely by whether you commit a repro plan. The gate is an **independent re-run and the execution surface for OS-specific plans** — your own run in the agent environment is the primary proof.
+- If a report is **not reproducible** (vague, missing project, environment- or hardware-specific, depends on an external service), do **not** force a reproduction or invent a plan: ask for a minimal repro and label `needs-more-info`, or fix with the best available non-UI test and say so.
+- When you have decided the task **is** a reproducible bug: first decide whether it needs a UI/E2E test. Use an AutoTest plan (`uitest` skill) for user-facing surfaces (Java Projects tree, context menus, commands, classpath, export jar, view modes). Use a `test/maven-suite` unit test or a `jdtls.ext` test for pure logic, backend, or build/packaging bugs.
+- Reproduce with the reporter's project: clone the linked repo as a sibling or recreate the zip/inline sources, then distill it to a **minimal committed fixture**. Do not commit whole user projects or large binaries.
+- **Prove the red→green in your own environment first — CI is a fallback, not a requirement.** Run the plan/test yourself: red on the un-fixed build, green after the fix, **iterating until you observe green** (crash/error = flaky, re-run; assertion-`fail` = fix wrong/incomplete, read `results.json` and iterate). Leave it committed as a regression test (`test/e2e-plans/repro-issue-<n>.yaml` is picked up by CI automatically), and commit the repro plan and the fix **together in one PR**. For an **OS-specific** bug your agent OS cannot reproduce (e.g. a Windows-only bug on a Linux agent), name the plan `repro-issue-<n>-windows.yaml` / `-linux.yaml` and let CI's red→green gate run it on that OS; read the result back with `gh run watch` / `gh run download` and iterate.
+- If no reproducible project is provided and the bug is environment-specific, ask for one and label `needs-more-info` — do not fabricate a fix for an unreproduced bug.
+
 ## UI and E2E tests
 
 - When asked to add, update, run, or debug UI/E2E coverage, prefer the AutoTest YAML workflow under `test/e2e-plans/`.
 - Use the `uitest` skill for UI test work. It should create or update `test/e2e-plans/*.yaml`, validate the plan, build the OSGi bundle and package the extension when needed, run AutoTest, and inspect `test-results/`.
 - Do not create legacy VS Code extension tests (`test/maven-suite`, `test/gui`) for UI coverage unless the user explicitly asks for that format.
-- Prefer deterministic AutoTest verifiers (`verifyTreeItem`, `verifyFile`, `verifyEditorTab`, `verifyClipboard`) over screenshot-only checks.
+- Prefer deterministic AutoTest verifiers (`verifyTreeItem`, `verifyFile`, `verifyEditorTab`, `verifyClipboard`) on the decisive assertion step; you do not need a verifier on every step. Screenshots prove a fix (a red run before, a green run after) — but never as the sole pass/fail authority for the decisive assertion.
+- **Evidence: textual self-run proof + CI-hosted screenshots; never commit binaries.** Raw `test-results/` is git-ignored and screenshots are **never committed to the repo**. For a self-run repro, prove red→green as text on the issue/PR: the decisive failing step and the **actual observed value** from the red run's `results.json`, then the green result. `.github/workflows/e2eUI.yml` runs each `test/e2e-plans/*.yaml` on Linux + Windows and uploads full `test-results/` (screenshots + `results.json`) as artifacts; for a `repro-issue-<n>.yaml` an OS-aware **red→green gate** rebuilds the PR base and runs the plan against base **and** head on the OS(es) the suffix implies, requiring `base ❌ RED → head ✅ GREEN` (`repro-gate-results-<os>-<plan>` artifacts) — link those for the images and for an OS-specific plan you could not run yourself; ordinary regression plans upload `e2e-results-<os>-<plan>`. A maintainer can drag an artifact PNG into a comment for an inline view (`user-images.githubusercontent.com`), still outside git.
diff --git a/.github/instructions/uitest-plan.instructions.md b/.github/instructions/uitest-plan.instructions.md
index 34cc6708..f76b9b5f 100644
--- a/.github/instructions/uitest-plan.instructions.md
+++ b/.github/instructions/uitest-plan.instructions.md
@@ -9,7 +9,7 @@ Test plans under `test/e2e-plans/` are executable YAML files consumed by `@vscja
 
 ## Setup rules
 
-- Use `setup.extension: "vscjava.vscode-java-pack"` plus `setup.vscodeVersion: "stable"` for most scenarios. Installing the Extension Pack for Java pulls in every Java extension the Java Projects view relies on, so there is no need to install `redhat.java` separately.
+- Use `setup.extension: "vscjava.vscode-java-pack"` plus `setup.vscodeVersion: "stable"` for most scenarios. Installing the Extension Pack for Java pulls in every Java extension the Java Projects view relies on, so there is no need to install `redhat.java` separately. Keep `stable` (always the latest release) — do **not** pin a concrete version. In the Copilot agent, `.github/workflows/copilot-setup-steps.yml` pre-downloads that latest build + the pack before the firewall, and `@vscode/test-electron` falls back to the cached build when the run-time version check is blocked, so the plan runs offline without pinning.
 - Install the extension under test from a local VSIX at runtime with `--vsix vscode-java-dependency.vsix` — do not rely on a marketplace copy of `vscjava.vscode-java-dependency`.
 - Use existing in-repo fixtures as the workspace: `../maven` (a `maven-archetype-quickstart` project: `my-app` / `com.mycompany.app` / `App.java`) or `../invisible` (an unmanaged-folder project for referenced-library scenarios). Paths are relative to the test plan file. Do not add large binary fixtures.
 - Referenced-library / classpath commands (`java.project.addLibraries`, `java.project.removeLibrary`, `java.project.addLibraryFolders`, `java.project.refreshLibraries`) only apply to invisible projects — use `../invisible`, not `../maven`, for those.
@@ -31,12 +31,13 @@ action: 'clickViewTitleAction "Java Projects" "Unlink with Editor"'
 
 ## Verification rules
 
-- Add deterministic verification to every meaningful step. The natural-language `verify` field is context for humans and failure analysis; it is not pass/fail authority by itself, and it is auto-passed when a plan runs with `--no-llm`.
+- You do **not** need a verifier on every step. Author the *actions* step-by-step, but gate pass/fail with a deterministic verifier only on the **decisive assertion step(s)** — the step that captures the reported bug — plus any step prone to a silent no-op (see the `expandTreeItem` / free-form action caveat above). Intermediate action steps can rely on AutoTest screenshots instead of their own verifier.
+- The natural-language `verify` field is context for humans and failure analysis; it is not pass/fail authority by itself, and it is auto-passed when a plan runs with `--no-llm`. So the decisive step **must** carry a deterministic verifier, or a `--no-llm` run is a false green.
 - Use `verifyTreeItem` (with `name:`, optional `exact: true`, and `visible: false` for absence) as the authoritative check for Java Projects tree state.
 - Use `verifyFile` after operations that create, modify, or delete files on disk (new type, export jar, permanent delete). VS Code can open duplicate editor tabs with stale buffers, so prefer file-content checks over editor checks after such operations.
 - Use `verifyEditorTab` to assert which file an action opened, and `verifyClipboard` for copy-path commands.
 - On state-check steps whose only assertion is a deterministic verifier, omit the `verify:` field to avoid false LLM failures.
-- Use screenshots only as diagnostics produced by AutoTest; do not make screenshots the only evidence of pass/fail.
+- Screenshots are AutoTest's evidence that an action ran and are the primary artifact for **proving a fix** (a red run before, a green run after). Do not make a screenshot the sole pass/fail authority for the decisive assertion — pair it with a deterministic verifier.
 
 ## Local validation commands
 
diff --git a/.github/scripts/prewarm-vscode.js b/.github/scripts/prewarm-vscode.js
new file mode 100644
index 00000000..d077823d
--- /dev/null
+++ b/.github/scripts/prewarm-vscode.js
@@ -0,0 +1,95 @@
+#!/usr/bin/env node
+/*
+ * Pre-download VS Code + the Java extensions into AutoTest's cache BEFORE the
+ * Copilot coding agent's firewall is enabled.
+ *
+ * AutoTest (`@vscjava/vscode-autotest`) launches VS Code via `@vscode/test-electron`:
+ *   1. downloadAndUnzipVSCode(version)            -> <cwd>/.vscode-test/vscode-<...>
+ *   2. resolveCliArgsFromVSCodeExecutablePath()   -> --extensions-dir=<cwd>/.vscode-test/extensions
+ *   3. code --install-extension <id> --force      -> pulls Marketplace bits into that extensions dir
+ *
+ * The VS Code CDN (update.code.visualstudio.com) and the Marketplace are NOT on the
+ * Copilot agent's default firewall allowlist, so those network calls fail at run time.
+ * This script performs the exact same three operations during `copilot-setup-steps`
+ * (which runs before the firewall), so the caches are warm and the firewalled UI run
+ * hits them offline.
+ *
+ * Because `@vscode/test-electron` derives its cache from `process.cwd()`, this MUST run
+ * from the repository root — the same directory AutoTest runs from at agent time.
+ *
+ * Env overrides:
+ *   VSCODE_VERSION       VS Code channel/version to warm (default: "stable")
+ *   PREWARM_EXTENSIONS   comma-separated extension ids (default: "vscjava.vscode-java-pack")
+ */
+"use strict";
+
+const path = require("path");
+const cp = require("child_process");
+
+function resolveTestElectron() {
+  // Prefer the exact copy that the globally installed AutoTest uses, so the
+  // version and default-cache-path logic match the agent run byte-for-byte.
+  const candidates = [];
+  try {
+    const globalRoot = cp.execSync("npm root -g", { encoding: "utf-8" }).trim();
+    candidates.push(path.join(globalRoot, "@vscjava", "vscode-autotest"));
+    candidates.push(globalRoot);
+  } catch {
+    /* npm not on PATH — fall back to local resolution below */
+  }
+  candidates.push(process.cwd());
+  try {
+    const entry = require.resolve("@vscode/test-electron", { paths: candidates });
+    return require(entry);
+  } catch {
+    // Last resort: a plain require (works if it is a local dependency).
+    return require("@vscode/test-electron");
+  }
+}
+
+async function main() {
+  const version = process.env.VSCODE_VERSION || "stable";
+  const extensions = (process.env.PREWARM_EXTENSIONS || "vscjava.vscode-java-pack")
+    .split(",")
+    .map((s) => s.trim())
+    .filter(Boolean);
+
+  const { downloadAndUnzipVSCode, resolveCliArgsFromVSCodeExecutablePath } = resolveTestElectron();
+
+  console.log(`⬇️  Pre-downloading VS Code "${version}" into ${path.join(process.cwd(), ".vscode-test")} ...`);
+  const vscodePath = await downloadAndUnzipVSCode(version);
+  console.log(`✅ VS Code ready: ${vscodePath}`);
+
+  const [cli, ...baseArgs] = resolveCliArgsFromVSCodeExecutablePath(vscodePath);
+  const extensionsDir = baseArgs.find((a) => a.startsWith("--extensions-dir="))?.split("=")[1];
+  console.log(`📁 Extensions dir: ${extensionsDir ?? "(default)"}`);
+
+  let failures = 0;
+  for (const ext of extensions) {
+    console.log(`📦 Installing ${ext} (+ Extension Pack members) ...`);
+    try {
+      cp.execFileSync(cli, [...baseArgs, "--install-extension", ext, "--force"], {
+        stdio: "inherit",
+        timeout: 300_000,
+        env: { ...process.env },
+        shell: process.platform === "win32",
+      });
+      console.log(`✅ Installed ${ext}`);
+    } catch (e) {
+      failures++;
+      console.warn(`⚠️  Failed to install ${ext}: ${e.message}`);
+    }
+  }
+
+  if (failures > 0) {
+    // Non-fatal: a missing extension only degrades UI reproduction, and the agent
+    // can still fall back to the non-UI path. Surface it without aborting setup.
+    console.warn(`⚠️  ${failures} extension(s) failed to pre-install; UI reproduction may be degraded.`);
+  }
+  console.log("🎉 VS Code + Java extensions pre-warmed for AutoTest.");
+}
+
+main().catch((err) => {
+  console.error("❌ Pre-warm failed:", err);
+  process.exit(1);
+});
diff --git a/.github/scripts/repro-gate.js b/.github/scripts/repro-gate.js
new file mode 100644
index 00000000..69fffa41
--- /dev/null
+++ b/.github/scripts/repro-gate.js
@@ -0,0 +1,186 @@
+#!/usr/bin/env node
+// Repro red→green gate judge.
+//
+// Decides, from two AutoTest `results.json` files, whether a repro plan
+// genuinely proves a bug fix: it must FAIL on the un-fixed base build (RED)
+// and PASS on the fixed head build (GREEN). Run by the `repro-gate-*` jobs in
+// .github/workflows/e2eUI.yml, once per repro-issue-<n>.yaml plan per OS.
+//
+// Usage:
+//   node repro-gate.js <baseResultsJson> <headResultsJson> [planName] [os]
+//
+// Exit codes:
+//   0  RED→GREEN proven (base failed a deterministic assertion, head all-pass)
+//   1  gate failed — one of:
+//        NOT_REPRODUCED  base passed        → plan does not reproduce the bug
+//        NOT_FIXED       head still fails   → fix does not resolve the bug
+//        INCONCLUSIVE    base/head crashed or errored (infra flake) → retry
+//
+// Why summary.failed (not the process exit code) decides RED:
+//   `autotest run` exits 1 for BOTH a real assertion failure and a crash /
+//   infra error. Only a deterministic assertion `fail` (summary.failed >= 1,
+//   not `errors`, not `crashed`) counts as a genuine reproduction. A crash on
+//   base would otherwise be mis-read as "reproduced".
+
+"use strict";
+
+const fs = require("fs");
+
+function loadReport(p) {
+  try {
+    const raw = fs.readFileSync(p, "utf8");
+    const json = JSON.parse(raw);
+    return { ok: true, ...json };
+  } catch (e) {
+    return { ok: false, missing: true, loadError: e.message };
+  }
+}
+
+function summaryOf(r) {
+  const s = r.summary || {};
+  return {
+    total: s.total ?? 0,
+    passed: s.passed ?? 0,
+    failed: s.failed ?? 0,
+    errors: s.errors ?? 0,
+    skipped: s.skipped ?? 0,
+  };
+}
+
+function failingSteps(r) {
+  return (r.results || [])
+    .filter((s) => s.status === "fail" || s.status === "error")
+    .map((s) => ({
+      stepId: s.stepId,
+      action: s.action,
+      status: s.status,
+      reason: (s.reason || "").toString().slice(0, 300),
+    }));
+}
+
+function classifyBase(r) {
+  // A trustworthy RED = did not crash AND at least one deterministic
+  // assertion `fail`. Errors / crashes are infra noise, not reproduction.
+  if (!r.ok || r.crashed === true) return "CRASHED";
+  const s = summaryOf(r);
+  if (s.failed >= 1) return "RED";
+  if (s.errors >= 1) return "ERRORED";
+  return "GREEN"; // ran clean, nothing failed → did NOT reproduce
+}
+
+function classifyHead(r) {
+  if (!r.ok || r.crashed === true) return "CRASHED";
+  const s = summaryOf(r);
+  if (s.failed === 0 && s.errors === 0) return "GREEN";
+  return "RED"; // fix build still failing / erroring
+}
+
+function icon(kind) {
+  return { RED: "❌", GREEN: "✅", CRASHED: "💥", ERRORED: "⚠️", ERROR: "⚠️" }[kind] || "❔";
+}
+
+function main() {
+  const [baseJson, headJson, planNameArg, osArg] = process.argv.slice(2);
+  if (!baseJson || !headJson) {
+    console.error("usage: repro-gate.js <baseResultsJson> <headResultsJson> [plan] [os]");
+    process.exit(2);
+  }
+  const plan = planNameArg || "repro-plan";
+  const os = osArg || process.env.RUNNER_OS || "";
+
+  const base = loadReport(baseJson);
+  const head = loadReport(headJson);
+  const baseKind = classifyBase(base);
+  const headKind = classifyHead(head);
+  const baseSum = summaryOf(base);
+  const headSum = summaryOf(head);
+
+  // ── Verdict ──────────────────────────────────────────────
+  let verdict, exit, message;
+  if (baseKind === "CRASHED" || baseKind === "ERRORED") {
+    verdict = "INCONCLUSIVE";
+    exit = 1;
+    message =
+      `Base (un-fixed) run did not produce a clean assertion result ` +
+      `(${baseKind.toLowerCase()}). This is an infrastructure flake, not a ` +
+      `reproduction — re-run the job. If it persists, the editor is not ` +
+      `launching (check the pre-warm / .vscode-test cache).`;
+  } else if (baseKind === "GREEN") {
+    verdict = "NOT_REPRODUCED";
+    exit = 1;
+    message =
+      `The repro plan PASSED on the un-fixed base build, so it does NOT ` +
+      `reproduce the bug (no RED). Tighten the decisive assertion so it ` +
+      `asserts the EXPECTED behaviour and therefore fails on the buggy build.`;
+  } else if (headKind === "CRASHED") {
+    verdict = "INCONCLUSIVE";
+    exit = 1;
+    message =
+      `Base reproduced the bug (RED), but the fixed head run crashed — ` +
+      `infrastructure flake, re-run the job.`;
+  } else if (headKind === "RED") {
+    verdict = "NOT_FIXED";
+    exit = 1;
+    message =
+      `The fix build STILL FAILS the repro plan (no GREEN), so the bug is ` +
+      `not resolved. See the failing head step(s) below.`;
+  } else {
+    verdict = "PROVEN";
+    exit = 0;
+    message = `RED→GREEN proven: the bug reproduces on base and is fixed on head.`;
+  }
+
+  // ── Markdown report ──────────────────────────────────────
+  const title = `Repro red→green gate — \`${plan}\`${os ? ` (${os})` : ""}`;
+  const baseDecisive =
+    baseKind === "RED"
+      ? failingSteps(base).map((s) => `\`${s.stepId}\`: ${s.reason || s.status}`).join("<br>") || "—"
+      : baseKind === "GREEN"
+      ? "no step failed (did not reproduce)"
+      : (base.crashReason || base.loadError || baseKind);
+  const headDecisive =
+    headKind === "GREEN"
+      ? `all ${headSum.total} step(s) passed`
+      : headKind === "RED"
+      ? failingSteps(head).map((s) => `\`${s.stepId}\`: ${s.reason || s.status}`).join("<br>") || "—"
+      : (head.crashReason || head.loadError || headKind);
+
+  const md = [
+    `### ${title}`,
+    ``,
+    `**Verdict: ${exit === 0 ? "✅" : "❌"} ${verdict}** — ${message}`,
+    ``,
+    `| Build | Under test | Result | Steps (p/f/e) | Decisive |`,
+    `|-------|-----------|--------|---------------|----------|`,
+    `| base | \`main\` (un-fixed) | ${icon(baseKind)} ${baseKind} | ${baseSum.passed}/${baseSum.failed}/${baseSum.errors} | ${baseDecisive} |`,
+    `| head | PR (fix) | ${icon(headKind)} ${headKind} | ${headSum.passed}/${headSum.failed}/${headSum.errors} | ${headDecisive} |`,
+    ``,
+    exit === 0
+      ? `> The base build reproduces the bug and the head build fixes it — a genuine regression guard.`
+      : `> Gate blocked: ${verdict}. ${message}`,
+    ``,
+  ].join("\n");
+
+  console.log(md);
+
+  // GitHub job summary
+  const summaryFile = process.env.GITHUB_STEP_SUMMARY;
+  if (summaryFile) {
+    try {
+      fs.appendFileSync(summaryFile, md + "\n");
+    } catch (e) {
+      console.error(`(could not write job summary: ${e.message})`);
+    }
+  }
+
+  // Workflow annotation
+  if (exit === 0) {
+    console.log(`::notice title=Repro gate ${plan}::${verdict} — ${message}`);
+  } else {
+    console.log(`::error title=Repro gate ${plan}::${verdict} — ${message}`);
+  }
+
+  process.exit(exit);
+}
+
+main();
diff --git a/.github/skills/repro/SKILL.md b/.github/skills/repro/SKILL.md
new file mode 100644
index 00000000..d6a35790
--- /dev/null
+++ b/.github/skills/repro/SKILL.md
@@ -0,0 +1,161 @@
+---
+name: repro
+description: Reproduce a reported vscode-java-dependency (Project Manager for Java) bug from a GitHub issue, using the reporter's project. Decide whether a UI/E2E test is needed, reproduce with AutoTest when it is, and leave a committed regression test. Use when an issue is assigned to Copilot, when asked to reproduce/confirm a bug, or when triaging a "needs-repro" report.
+---
+
+# Reproduce a reported bug
+
+Use this skill when the task is to fix or confirm a **reproducible bug** in `vscode-java-dependency` (Project Manager for Java) — an issue that carries repro steps + a project, or an explicit request to reproduce/confirm a report.
+
+**Do NOT use this skill (and do not author a `repro-issue-*.yaml`) when the task is not a reproducible bug**, e.g. a new feature, refactor, performance work, dependency/version bump, docs, config, CI, or code cleanup — those are ordinary PRs with ordinary unit/integration tests. Also skip it when a report is **not reproducible** (vague, no project, environment/hardware-specific, external service): ask for a minimal repro and label `needs-more-info`, or fix with the best available non-UI test — never invent a repro plan just to have one. The CI red→green gate only triggers when a `repro-issue-<n>.yaml` is present, so not entering this flow means nothing extra runs.
+
+Goal: turn a bug report into a **deterministic, committed reproduction** that fails before the fix and passes after it. Prefer the smallest reproduction that proves the bug. Not every bug needs a UI test — decide first.
+
+## 1. Extract the report
+
+From the issue body (and the `bug_report` template fields) collect:
+
+- **Repro project** — a public GitHub repo link, an attached zip (a `https://github.com/user-attachments/files/<id>/<name>.zip` link in the issue body), or an inline `pom.xml` / `build.gradle` + sources. If none is provided and the bug is environment-specific, ask for one and label the issue `needs-more-info` instead of guessing.
+- **Steps to reproduce**, **expected** vs **actual** behavior, and the affected surface (tree view, context menu, command id, classpath, export jar, project creation, etc.).
+- **Versions** — VS Code, Extension Pack for Java, JDK, OS.
+
+## 2. Decide: does this need a UI/E2E test?
+
+The reproduction and the fix-proof are two different questions — decide each:
+
+- **Reproduction** can often be non-UI or even a code read, especially for simple, obvious bugs. Prefer the cheapest reproduction that captures the report.
+- **Fix-proof** is where a UI/E2E test earns its cost: a red run before the fix and a green run after, with screenshots, is the strongest evidence for a user-facing bug. If the bug is user-facing, favour leaving a committed UI plan even when you first reproduced it another way.
+- **Prove the red→green with an actual run — first in your own environment.** Your **default proof surface is the agent's own environment**: build the product, run the plan/test yourself, and observe the decisive assertion **fail on the un-fixed code and pass on the fix**. That is the closed loop — no CI approval, and you see the screenshots directly (see §4/§5). CI is a **fallback only for OS-specific bugs your environment cannot reproduce** (e.g. a Windows-only bug when the agent runs on Linux) and an always-on regression net — it is **not** a required step for every repro. Never merely assert red→green in the PR body; make the plan actually reproduce, and **iterate until you have observed it go green**.
+
+**Use a UI/E2E AutoTest plan (`uitest` skill) when the bug is in the user-facing surface**, e.g.:
+
+- Java Projects tree rendering, ordering, labels, icons, or node presence/absence.
+- Context-menu / inline title actions, command palette entries, view focus/reveal.
+- Referenced Libraries / classpath UI (`../invisible` project), export jar, new type creation, link-with-editor, view modes.
+
+**Do NOT use a UI test — reproduce with a unit test or code analysis — when the bug is:**
+
+- Pure logic / data structures reachable from the extension API → add or extend a `test/maven-suite` test.
+- In the Java OSGi backend (`jdtls.ext/**`) → reproduce with a `jdtls.ext` JUnit test or by inspecting the LSP delegate command handler.
+- Build scripts, packaging, activation events, `package.json` contributions, or documentation → reproduce by reading/running the relevant script; no VS Code launch needed.
+
+When unsure, prefer the cheaper non-UI reproduction first; escalate to a UI test only if the behavior cannot be observed without the running view.
+
+## 3. Bring in the reporter's project
+
+Keep the committed footprint small and CI-reproducible:
+
+- **Public repo**: clone it as a sibling at runtime and point the plan's `workspace` at it while iterating locally:
+
+  ```powershell
+  git clone --depth 1 <repo-url> ..\repro-issue-<n>
+  ```
+
+  (`github.com` and `codeload.github.com` are on the coding-agent firewall's default allowlist, so the clone is not blocked.)
+
+- **Attached zip**: the issue body carries a link like `https://github.com/user-attachments/files/<id>/<name>.zip`. Download it (following the redirect) and unzip into a sibling dir, then point the plan's `workspace` at the extracted project:
+
+  ```powershell
+  # The user-attachments link 302-redirects to a signed objects.githubusercontent.com
+  # URL. BOTH github.com and objects.githubusercontent.com are on the coding-agent
+  # firewall's default allowlist, so this download is NOT blocked (unlike the VS Code
+  # binary). Use -L to follow the redirect. If the signed URL has expired, re-read the
+  # issue to get a fresh link, then re-download.
+  curl -L -o ..\repro-issue-<n>.zip "https://github.com/user-attachments/files/<id>/<name>.zip"
+  Expand-Archive ..\repro-issue-<n>.zip -DestinationPath ..\repro-issue-<n>   # bash: unzip
+  ```
+
+  **Treat the archive as untrusted input**: extract only — do not run its build scripts, Maven/Gradle wrappers, or other executables blindly. Confirm it is an ordinary Java project (`pom.xml` / `build.gradle` + `src/`), use it as the AutoTest `workspace:`, and commit only the minimal distilled fixture (never the raw zip or build outputs).
+
+- **Inline sources**: recreate the project under `test\e2e-fixtures\issue-<n>\` (or reuse `test/maven` / `test/invisible` if the existing fixtures already trigger the bug).
+- Once reproduced, **distill it to the minimal fixture** that still fails and commit that (not the whole user project) so the regression test runs in CI without external clones or large binaries.
+
+## 4. Reproduce
+
+**This whole step runs in your own environment — no CI needed.** Reproduce, fix, and prove the fix by running the plan/test yourself; CI only re-proves OS-specific cases (§5). VS Code is pre-warmed in the agent, so the local UI loop is fast.
+
+**UI path** — create `test/e2e-plans/repro-issue-<n>.yaml` following the `uitest` skill and `.github/instructions/uitest-plan.instructions.md`:
+
+```powershell
+npx -y @vscjava/vscode-autotest validate test\e2e-plans\repro-issue-<n>.yaml
+npm run build-server
+npx @vscode/vsce package -o vscode-java-dependency.vsix
+npx -y @vscjava/vscode-autotest run test\e2e-plans\repro-issue-<n>.yaml --vsix vscode-java-dependency.vsix --no-llm --output test-results\repro-issue-<n>
+```
+
+**If the bug is OS-specific, name the plan for that OS** — you may not be able to reproduce it in your own environment at all (e.g. a Windows-only bug while the agent runs on Linux). The filename suffix routes the **CI fallback** (§5) to the right OS:
+
+- `repro-issue-<n>-windows.yaml` — a **Windows-only** bug (e.g. drive-letter / path-separator / `\`-vs-`/` issues). CI runs it on **Windows only**; the Linux gate skips it (the bug does not manifest there, so a Linux run would spuriously report `NOT_REPRODUCED`). A Linux agent cannot prove this one itself — reproduce by reasoning + code read, commit the `-windows` plan, and let CI (§5) run the red→green on Windows.
+- `repro-issue-<n>-linux.yaml` — a **Linux-only** bug. CI's Windows gate skips it. A Linux agent **can** reproduce this one itself.
+- `repro-issue-<n>.yaml` — an **OS-agnostic** bug. You can reproduce and prove it entirely in your own environment; CI additionally re-runs it on **both** OSes as a regression net.
+
+Pick the suffix from the report's platform: if the issue only reproduces on one OS, use that OS's suffix; only use the plain name when you have confirmed the bug is platform-independent.
+
+Author the plan step-by-step for the **actions**, but you do not need a verifier on every step — put a deterministic verifier (`verifyTreeItem` / `verifyFile` / `verifyEditorTab` / `verifyClipboard`) on the **decisive assertion step** (the one that captures the bug) and on any step prone to a silent no-op. That decisive verifier must assert the **expected** behavior, so it **fails on the current (buggy) build**. Inspect `test-results/repro-issue-<n>/results.json` and the screenshots to confirm the failure matches the report, and record the failing step + the **actual observed value** as before-fix evidence (the screenshots stay in the git-ignored `test-results/`; CI hosts them as an artifact, see §5 — never commit them).
+
+**Run this on the un-fixed checkout FIRST — see RED before you write the fix.** That is the whole point of the reproduction: build + run the plan against the current (buggy) product code and confirm the decisive verifier fails with the reported symptom. Only then move to §5 and write the fix. This local red→green loop is fast in the agent env (VS Code is pre-warmed) and is what gives you confidence the plan actually reproduces before CI re-proves it.
+
+**Non-UI path** — add the failing `test/maven-suite` or `jdtls.ext` test and run the existing suite (`npm test`, or the `jdtls.ext` Maven test) to confirm it fails.
+
+## 5. Fix, then prove it — iterate until green
+
+1. Fix the product code (`src/**` for TS, `jdtls.ext/**` for the OSGi backend).
+2. **Rebuild and repackage the VSIX** (`npm run build-server` + `vsce package`) before rerunning any UI plan — never rerun against a stale VSIX.
+3. Rerun the reproduction **in your own environment**; the same plan/test must now pass (red → green).
+4. **Iterate until you observe green** — follow the convergent loop below.
+5. **Capture evidence — keep binaries out of git.** Raw `test-results/` is **git-ignored**, and screenshots are **never committed to the repo**. Prove it two ways instead:
+   - **Textual before/after on the issue/PR (always — this is your primary proof).** Quote the red run's `results.json`: the decisive failing step and the **actual observed value** it produced (e.g. the clipboard text, the tree label), then the after-fix green result. Because you observed this yourself, it stands on its own.
+   - **Screenshots, GitHub-hosted, not in git.** Every committed `repro-issue-<n>.yaml` is also run by `.github/workflows/e2eUI.yml`, which uploads the full `test-results/` (screenshots + `results.json`) as a `repro-gate-results-<os>-<plan>` artifact — link that run/artifact for the images. For an inline visual, a maintainer (or you, if image upload is reachable) can drag a PNG into an issue or PR comment; GitHub hosts it on `user-images.githubusercontent.com`, still outside git. **Never add PNGs to the repository.**
+6. Leave the reproduction committed as a permanent regression test. `.github/workflows/e2eUI.yml` discovers `test/e2e-plans/*.yaml` automatically, so `repro-issue-<n>.yaml` becomes its own CI check with no workflow edits.
+
+### Iterate until green (the convergent loop)
+
+After each build+run, read `test-results/repro-issue-<n>/results.json` and the decisive step's screenshot, then branch:
+
+- **Head GREEN (and base was RED)** → done; you have proven the fix. Go to evidence (step 5).
+- **Head still a deterministic assertion `fail`** → the fix is wrong or incomplete. Read the *actual* observed state in `results.json` (e.g. the clipboard text, the tree label) — it tells you what the code really produced. Form a new hypothesis, adjust the fix (or the plan, if it asserts the wrong thing), rebuild, and rerun.
+- **`error` / `crash` (not a clean `fail`)** → treat as a **flaky/infra result, not a repro signal**: the language server may not have become ready, the tree may not have loaded, or the editor may not have launched. Increase `waitFor`/`timeout`, add a settle step, and **re-run** — never conclude anything about the bug from a crash/error. (This is exactly how a Linux run of a `-windows` plan fails: an env error, not a reproduction.)
+
+Repeat build→run→analyze until head is green. If after several honest iterations the fix is plausibly correct but the plan still fails only because of a harness/environment variant (e.g. the fixture runs from a `%TEMP%` worktree whose path form differs from a real install), do **not** force it: escalate to a maintainer with the evidence and your analysis, and label `needs-human-review`. A loop that stops with an explained blocker beats a green you faked.
+
+### CI: the OS-specific fallback and independent re-run
+
+Your own run is the primary proof. CI adds two things on top — **neither is required for an OS-agnostic bug you already proved locally**:
+
+1. **The execution surface you may lack.** For an OS-specific plan (`-windows` / `-linux`) that your agent OS cannot reproduce, CI is where the red→green actually runs. Commit the plan + fix; on the PR, `.github/workflows/e2eUI.yml` rebuilds the base (un-fixed) VSIX and runs the plan against base **and** head on that OS, and `.github/scripts/repro-gate.js` requires `base ❌ RED → head ✅ GREEN`.
+2. **An independent re-run** that does not trust your committed artifacts, plus the always-on `java-dep-*` regression net. After merge (push to `main`) the base already contains the fix, so the plan is demoted to an ordinary GREEN regression check.
+
+Gate verdicts (treat them exactly like your own run): `NOT_REPRODUCED` (plan passed on the un-fixed base — tighten the decisive assertion to the **expected** behaviour), `NOT_FIXED` (head still fails — read the head `results.json` and iterate), `INCONCLUSIVE` (base or head crashed/errored — flaky, re-run). The `base ❌ RED → head ✅ GREEN` verdict + `repro-gate-results-<os>-<plan>` artifacts are the machine fix-proof for OS-specific plans.
+
+**Read CI back to close the loop from the agent** — pull the result and iterate without leaving the session:
+
+```bash
+rid=$(gh run list --branch "$BRANCH" --workflow "E2E UI Tests" -L1 --json databaseId -q '.[0].databaseId')
+gh run watch "$rid" || true
+gh run download "$rid" -n "repro-gate-results-windows-repro-issue-<n>" -D ci-evidence/
+# read ci-evidence/**/results.json + view the decisive screenshot, then fix/plan and push again
+```
+
+> **Approval note:** CI on a Copilot-authored PR may sit in `action_required` until a maintainer clicks **Approve and run**. That is expected and it does **not** block the self-run loop (which needs no CI). For an OS-specific bug, ask the maintainer to approve once, then read the result back as above.
+
+Because CI reconstructs the red from the base commit, your PR stays a single clean PR — **commit the repro plan and the fix together**; you never push a knowingly-broken commit.
+
+## 6. Report back
+
+Every PR or comment must state **how you reproduced** (UI plan vs unit test vs code read) and the **execution status** (ran red→green, or could not execute — and why). Never claim a green run you did not observe.
+
+- **Reproduced + fixed**: open a **single PR containing the repro plan and the fix together**. State that you ran it red→green **in your own environment**, and show the proof as **text**: the decisive failing step and the **actual observed value** from your red run's `results.json`, plus the green after-fix result. For the images, link the CI `repro-gate-results-<os>-<plan>` artifact (and, for an OS-specific plan you could not run yourself, its `base ❌ RED → head ✅ GREEN` verdict). **Do not commit screenshots to the repo.** Reference the issue.
+- **Reproduced, report only**: comment with the reproduction (plan or test), the observed vs expected behavior, and the exact failing step.
+- **Reproduced but could not run the UI test**: remember a `(dns block)` on `update.code.visualstudio.com` is expected and non-fatal (see Environment notes) — it is **not** a reason to skip the UI path. Only if the editor genuinely never launches, commit the plan, explain the real failure, and fall back to a non-UI proof or ask a maintainer to unblock.
+- **Could not reproduce**: comment with what you tried and precisely what is missing; label `needs-more-info`. Do not fabricate a fix for an unreproduced bug.
+
+## Environment notes
+
+- The Copilot coding agent environment is prepared by `.github/workflows/copilot-setup-steps.yml` (JDK 21, Node 20, AutoTest, Xvfb, a baseline VSIX). Assume these are present.
+- That setup runs **before the agent firewall**, and its final step pre-downloads the **latest** VS Code (`stable`) and the `vscjava.vscode-java-pack` extensions into AutoTest's `<repo>/.vscode-test` cache (via `.github/scripts/prewarm-vscode.js`). Keep the plans on `vscodeVersion: "stable"` (do **not** pin a version) — `stable` always means the current latest release, and it is exactly what the pre-warm cached.
+- **A `(dns block)` on `update.code.visualstudio.com` at run time is EXPECTED and NON-FATAL — do not treat it as a UI-test failure or abandon the UI path.** AutoTest re-resolves `stable` over the network at launch; the firewall blocks that, but `@vscode/test-electron` catches it and **falls back to the already-cached latest VS Code**, and the Java extensions are already installed in `.vscode-test/extensions`. So the editor still launches offline. VS Code's own telemetry/Marketplace DNS calls are blocked too and are equally harmless.
+- Only if the pre-warm genuinely did not run (e.g. an older branch, or a cold `.vscode-test` with no cached build) will the UI run actually fail to launch. In that case fall back to the non-UI path and note the limitation.
+- **Evidence: textual self-run proof + CI-hosted screenshots; never commit binaries.** Your primary proof is the run you did yourself — quote the decisive step and the **actual observed value** from the red run's `results.json`, then the green result, on the issue/PR. Screenshots are **not** committed to the repo: every `repro-issue-<n>.yaml` on a PR is run by CI (on the OS(es) the suffix implies) against base and head, and the whole `test-results/` (screenshots + `results.json`) is uploaded as `repro-gate-results-<os>-<plan>` artifacts with a `base ❌ RED → head ✅ GREEN` verdict — link those for the images and for an OS-specific plan you could not run yourself. (Ordinary `java-dep-*.yaml` regression plans upload `e2e-results-<os>-<plan>` from a single green run.) A human can drag an artifact PNG into a comment (`user-images.githubusercontent.com`) for an inline view — still out of git.
+- Maintainer option: adding `update.code.visualstudio.com` to the Copilot coding-agent firewall allowlist (repo **Settings → Copilot → coding agent**, see https://gh.io/copilot/firewall-config) removes the version-resolution block entirely, so the run is clean and does not rely on the offline fallback. The pre-warm still makes the 276 MB binary + Marketplace pack a cache hit, so nothing large is re-fetched.
+- **Issue attachments and repo clones are downloadable — they are NOT firewall-blocked.** `github.com`, `objects.githubusercontent.com`, `*.githubusercontent.com`, and `codeload.github.com` are all on the coding-agent's default allowlist, so cloning a linked public repo and `curl -L`-downloading an attached `user-attachments` zip both work at run time. (Only the VS Code binary host `update.code.visualstudio.com` is not allowlisted — that is why it is pre-warmed instead, see above.) Extract user-supplied zips as untrusted data: do not run their build scripts blindly.
+- Always run AutoTest with `--no-llm` in the agent so pass/fail comes only from deterministic verifiers.
diff --git a/.github/workflows/copilot-setup-steps.yml b/.github/workflows/copilot-setup-steps.yml
new file mode 100644
index 00000000..8fab9cc4
--- /dev/null
+++ b/.github/workflows/copilot-setup-steps.yml
@@ -0,0 +1,80 @@
+name: "Copilot Setup Steps"
+
+# Prepares the GitHub Copilot coding agent's ephemeral environment so it can
+# build this extension and reproduce bugs with AutoTest UI/E2E plans without a
+# slow trial-and-error dependency hunt.
+#
+# The `copilot-setup-steps` job runs BEFORE Copilot starts working. It mirrors
+# the Linux path of `.github/workflows/e2eUI.yml`: JDK 21 + Node 20, the OSGi
+# bundle build (which also warms the Maven cache), the AutoTest CLI, and the
+# Xvfb / GTK libraries required to launch VS Code headless.
+#
+# NOTE: This workflow only takes effect once it is on the default branch.
+#
+# Setup steps run BEFORE the Copilot agent firewall is enabled, so the final
+# step pre-downloads VS Code (stable) and the Java Extension Pack into AutoTest's
+# `<repo>/.vscode-test` cache. That warms the exact files AutoTest fetches from
+# the VS Code CDN + Marketplace at run time — hosts the firewall blocks — so the
+# firewalled UI reproduction launches offline. See .github/skills/repro/SKILL.md.
+
+on:
+  workflow_dispatch:
+  push:
+    paths:
+      - .github/workflows/copilot-setup-steps.yml
+  pull_request:
+    paths:
+      - .github/workflows/copilot-setup-steps.yml
+
+jobs:
+  # The job MUST be called `copilot-setup-steps` or Copilot will not pick it up.
+  copilot-setup-steps:
+    runs-on: ubuntu-latest
+    timeout-minutes: 30
+
+    # Lowest permissions needed for setup. Copilot gets its own token afterwards.
+    permissions:
+      contents: read
+
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@v4
+
+      - name: Set up JDK 21
+        uses: actions/setup-java@v4
+        with:
+          java-version: "21"
+          distribution: "temurin"
+
+      - name: Set up Node.js
+        uses: actions/setup-node@v4
+        with:
+          node-version: "20"
+          cache: "npm"
+
+      - name: Install graphics libraries for headless VS Code
+        run: |
+          sudo apt-get update
+          sudo apt-get install -y libxkbfile-dev pkg-config libsecret-1-dev libxss1 dbus xvfb libgtk-3-0 libgbm1
+
+      - name: Install Node.js modules
+        run: npm install
+
+      - name: Install global build & test tooling
+        run: npm install -g @vscode/vsce @vscjava/vscode-autotest
+
+      - name: Build OSGi bundle (warms the Maven cache)
+        run: npm run build-server
+
+      - name: Package a baseline VSIX
+        # Produces vscode-java-dependency.vsix so the first AutoTest run does not
+        # pay the full build cost. Copilot must repackage after editing src/** or
+        # jdtls.ext/** before rerunning a plan against a stale VSIX.
+        run: vsce package -o vscode-java-dependency.vsix
+
+      - name: Pre-download VS Code and Java extensions (before firewall)
+        # Warms <repo>/.vscode-test so the firewalled agent run does not need the
+        # VS Code CDN or Marketplace. Best-effort: a failure here only degrades UI
+        # reproduction, so it must not block Copilot from starting work.
+        continue-on-error: true
+        run: node .github/scripts/prewarm-vscode.js
diff --git a/.github/workflows/e2eUI.yml b/.github/workflows/e2eUI.yml
index 96855ee7..2031b8ab 100644
--- a/.github/workflows/e2eUI.yml
+++ b/.github/workflows/e2eUI.yml
@@ -9,19 +9,42 @@ on:
 # Split-pipeline E2E UI workflow.
 #
 #   lint              → tslint + checkstyle (ubuntu, OS-agnostic)
-#   discover-plans    → emits a matrix of test-plan basenames
+#   discover-plans    → splits plans into two matrices:
+#                         · regression  = java-dep-*.yaml (run once, expect GREEN)
+#                         · repro       = repro-issue-*.yaml (red→green gate)
 #
-#   build-linux       ─┐
-#   e2e-linux  (×plan) ┤
-#                      ├──→ analyze  → unified summary covering both OSes
-#   build-windows     ─┤
-#   e2e-windows (×plan)┘
+#   build-linux        ─┐
+#   e2e-linux  (×reg)   ┤
+#                       ├──→ analyze  → unified summary covering both OSes
+#   build-windows      ─┤
+#   e2e-windows (×reg)  ┘
 #
-# Per-OS pipelines run completely independently: Linux e2e jobs do NOT
-# wait for the Windows VSIX build (and vice versa), so a slow Windows
-# build cannot delay the start of Linux e2e plans. Each matrix cell
-# surfaces as its own PR check, so failures are visible without an
-# extra gate job.
+#   build-base-linux   ─┐ (PR only, only when a repro-issue-*.yaml exists)
+#   repro-gate-linux   ─┤   builds the PR *base* (un-fixed) VSIX, runs the
+#   build-base-windows ─┤   repro plan against BOTH base and head, and proves
+#   repro-gate-windows ─┘   the bug is RED on base and GREEN on head.
+#
+# Per-OS pipelines run completely independently: Linux jobs do NOT wait for
+# the Windows VSIX build (and vice versa). Each matrix cell surfaces as its
+# own PR check, so failures are visible without an extra gate job.
+#
+# ── Red→green gate (model A) ────────────────────────────────
+# A regression plan run once only ever proves GREEN on the fixed code. To
+# prove a Copilot-authored repro plan genuinely captures the bug, the gate
+# rebuilds the PR's *base* commit (main, un-fixed) into its own VSIX and runs
+# the SAME repro-issue-<n>.yaml against base AND head in one CI run:
+#     base (main, un-fixed) → expect ❌ RED  (bug reproduced)
+#     head (PR, fixed)      → expect ✅ GREEN (fix works)
+# .github/scripts/repro-gate.js reads both results.json files and passes the
+# check only when base failed a deterministic assertion and head is all-pass
+# (distinguishing a genuine assertion RED from an infra crash). The gate runs
+# only on pull_request events; after merge (push to main) the base already
+# contains the fix, so the plan is demoted to an ordinary GREEN regression.
+#
+# discover-plans globs test/e2e-plans/*.yaml, so a Copilot-authored
+# repro-issue-<n>.yaml is picked up automatically. Each plan's full
+# test-results/ (screenshots + results.json) is uploaded as the
+# e2e-results-<os>-<plan> artifact — the fix-proof for repro PRs.
 #
 # Inspired by vscode-java-pack/.github/workflows/e2e-autotest.yml.
 
@@ -60,7 +83,11 @@ jobs:
     name: Discover E2E Plans
     runs-on: ubuntu-latest
     outputs:
-      matrix: ${{ steps.scan.outputs.matrix }}
+      regression: ${{ steps.scan.outputs.regression }}
+      repro_linux: ${{ steps.scan.outputs.repro_linux }}
+      repro_windows: ${{ steps.scan.outputs.repro_windows }}
+      has_repro_linux: ${{ steps.scan.outputs.has_repro_linux }}
+      has_repro_windows: ${{ steps.scan.outputs.has_repro_windows }}
     steps:
       - uses: actions/checkout@v4
 
@@ -68,9 +95,48 @@ jobs:
         id: scan
         shell: bash
         run: |
-          plans=$(ls test/e2e-plans/*.yaml | xargs -n1 basename | sed 's/\.yaml$//' | jq -R . | jq -sc .)
-          echo "matrix=$plans" >> "$GITHUB_OUTPUT"
-          echo "Found plans: $plans"
+          all=$(ls test/e2e-plans/*.yaml | xargs -n1 basename | sed 's/\.yaml$//')
+          repro=$(printf '%s\n' "$all" | grep '^repro-issue-' || true)
+          regression=$(printf '%s\n' "$all" | grep -v '^repro-issue-' || true)
+
+          # The red→green gate only makes sense on a PR (it diffs base vs head).
+          # On push to main the base already contains the fix, so run every
+          # plan — including repro-issue-* — as an ordinary GREEN regression.
+          if [ "${{ github.event_name }}" != "pull_request" ]; then
+            regression="$all"
+            repro=""
+          fi
+
+          # OS-specific bugs: a repro plan can target one OS by filename suffix.
+          #   repro-issue-<n>-windows.yaml → Windows gate only
+          #   repro-issue-<n>-linux.yaml   → Linux gate only
+          #   repro-issue-<n>.yaml         → both gates (OS-agnostic bug)
+          # This avoids a Windows-only bug being reported NOT_REPRODUCED on the
+          # Linux gate (where the bug simply does not manifest), and vice versa.
+          repro_linux=$(printf '%s\n' "$repro"   | grep -v -- '-windows$' || true)
+          repro_windows=$(printf '%s\n' "$repro" | grep -v -- '-linux$'   || true)
+
+          to_json() {
+            local cleaned
+            cleaned=$(printf '%s\n' "$1" | grep -v '^[[:space:]]*$' || true)
+            if [ -z "$cleaned" ]; then
+              echo '[]'
+            else
+              printf '%s\n' "$cleaned" | jq -R . | jq -sc .
+            fi
+          }
+          reg_json=$(to_json "$regression")
+          linux_json=$(to_json "$repro_linux")
+          windows_json=$(to_json "$repro_windows")
+
+          echo "regression=$reg_json"      >> "$GITHUB_OUTPUT"
+          echo "repro_linux=$linux_json"   >> "$GITHUB_OUTPUT"
+          echo "repro_windows=$windows_json" >> "$GITHUB_OUTPUT"
+          [ "$linux_json"   = "[]" ] && echo "has_repro_linux=false"   >> "$GITHUB_OUTPUT" || echo "has_repro_linux=true"   >> "$GITHUB_OUTPUT"
+          [ "$windows_json" = "[]" ] && echo "has_repro_windows=false" >> "$GITHUB_OUTPUT" || echo "has_repro_windows=true" >> "$GITHUB_OUTPUT"
+          echo "Regression plans:   $reg_json"
+          echo "Repro (Linux gate): $linux_json"
+          echo "Repro (Win gate):   $windows_json"
 
   # ── Build VSIX (Linux) ──────────────────────────────────
   build-linux:
@@ -157,7 +223,7 @@ jobs:
     strategy:
       fail-fast: false
       matrix:
-        plan: ${{ fromJson(needs.discover-plans.outputs.matrix) }}
+        plan: ${{ fromJson(needs.discover-plans.outputs.regression) }}
 
     steps:
       - uses: actions/checkout@v4
@@ -220,7 +286,7 @@ jobs:
     strategy:
       fail-fast: false
       matrix:
-        plan: ${{ fromJson(needs.discover-plans.outputs.matrix) }}
+        plan: ${{ fromJson(needs.discover-plans.outputs.regression) }}
 
     steps:
       - uses: actions/checkout@v4
@@ -262,6 +328,240 @@ jobs:
           path: test-results/
           retention-days: 7
 
+  # ── Build base (un-fixed) VSIX for the red→green gate ───
+  # Only runs on PRs that add/contain a repro-issue-*.yaml plan. Checks out
+  # the PR's base commit (main, before the fix) and packages it so the gate
+  # can prove the repro plan is RED on un-fixed code.
+  build-base-linux:
+    name: Build base VSIX (Linux)
+    needs: [ discover-plans ]
+    if: ${{ github.event_name == 'pull_request' && needs.discover-plans.outputs.has_repro_linux == 'true' }}
+    runs-on: ubuntu-latest
+    timeout-minutes: 20
+    steps:
+      - uses: actions/checkout@v4
+        with:
+          ref: ${{ github.event.pull_request.base.sha }}
+
+      - name: Set up JDK 21
+        uses: actions/setup-java@v4
+        with:
+          java-version: '21'
+          distribution: 'temurin'
+
+      - name: Setup Node.js environment
+        uses: actions/setup-node@v4
+        with:
+          node-version: 20
+
+      - name: Install Node.js modules
+        run: npm install
+
+      - name: Install VSCE
+        run: npm install -g @vscode/vsce
+
+      - name: Build OSGi bundle
+        run: npm run build-server
+
+      - name: Build base VSIX file
+        run: vsce package -o vscode-java-dependency-base.vsix
+
+      - name: Upload base VSIX artifact
+        uses: actions/upload-artifact@v4
+        with:
+          name: vsix-base-linux
+          path: vscode-java-dependency-base.vsix
+          retention-days: 1
+
+  build-base-windows:
+    name: Build base VSIX (Windows)
+    needs: [ discover-plans ]
+    if: ${{ github.event_name == 'pull_request' && needs.discover-plans.outputs.has_repro_windows == 'true' }}
+    runs-on: windows-latest
+    timeout-minutes: 20
+    steps:
+      - uses: actions/checkout@v4
+        with:
+          ref: ${{ github.event.pull_request.base.sha }}
+
+      - name: Set up JDK 21
+        uses: actions/setup-java@v4
+        with:
+          java-version: '21'
+          distribution: 'temurin'
+
+      - name: Setup Node.js environment
+        uses: actions/setup-node@v4
+        with:
+          node-version: 20
+
+      - name: Install Node.js modules
+        run: npm install
+
+      - name: Install VSCE
+        run: npm install -g @vscode/vsce
+
+      - name: Build OSGi bundle
+        run: npm run build-server
+
+      - name: Build base VSIX file
+        run: vsce package -o vscode-java-dependency-base.vsix
+
+      - name: Upload base VSIX artifact
+        uses: actions/upload-artifact@v4
+        with:
+          name: vsix-base-windows
+          path: vscode-java-dependency-base.vsix
+          retention-days: 1
+
+  # ── Red→green gate (Linux) ──────────────────────────────
+  # Runs each repro-issue-*.yaml against the base (un-fixed) VSIX and the
+  # head (fixed) VSIX, then repro-gate.js proves base=RED and head=GREEN.
+  repro-gate-linux:
+    name: Repro Gate Linux (${{ matrix.plan }})
+    needs: [ build-linux, build-base-linux, discover-plans ]
+    if: ${{ github.event_name == 'pull_request' && needs.discover-plans.outputs.has_repro_linux == 'true' }}
+    runs-on: ubuntu-latest
+    timeout-minutes: 40
+    strategy:
+      fail-fast: false
+      matrix:
+        plan: ${{ fromJson(needs.discover-plans.outputs.repro_linux) }}
+
+    steps:
+      - uses: actions/checkout@v4   # head checkout provides the repro plan yaml
+
+      - name: Setup Build Environment (Xvfb)
+        run: |
+          sudo apt-get update
+          sudo apt-get install -y libxkbfile-dev pkg-config libsecret-1-dev libxss1 dbus xvfb libgtk-3-0 libgbm1
+          sudo /usr/bin/Xvfb :99 -screen 0 1920x1080x24 > /dev/null 2>&1 &
+          sleep 3
+
+      - name: Set up JDK 21
+        uses: actions/setup-java@v4
+        with:
+          java-version: '21'
+          distribution: 'temurin'
+
+      - name: Setup Node.js environment
+        uses: actions/setup-node@v4
+        with:
+          node-version: 20
+
+      - name: Setup autotest
+        run: npm install -g @vscjava/vscode-autotest
+
+      - name: Download head VSIX (fixed)
+        uses: actions/download-artifact@v4
+        with:
+          name: vsix-linux
+          path: .
+
+      - name: Download base VSIX (un-fixed)
+        uses: actions/download-artifact@v4
+        with:
+          name: vsix-base-linux
+          path: .
+
+      - name: Run repro plan on base then head
+        shell: bash
+        run: |
+          # Each run is EXPECTED to have a non-zero exit on base (RED), so do
+          # not let the step fail here — repro-gate.js is the sole judge.
+          set +e
+          DISPLAY=:99 autotest run "test/e2e-plans/${{ matrix.plan }}.yaml" \
+            --vsix "$(pwd)/vscode-java-dependency-base.vsix" \
+            --no-llm --output "test-results/base-${{ matrix.plan }}"
+          echo "base autotest exit: $?"
+          DISPLAY=:99 autotest run "test/e2e-plans/${{ matrix.plan }}.yaml" \
+            --vsix "$(pwd)/vscode-java-dependency.vsix" \
+            --no-llm --output "test-results/head-${{ matrix.plan }}"
+          echo "head autotest exit: $?"
+          set -e
+
+      - name: Judge red→green
+        shell: bash
+        run: |
+          node .github/scripts/repro-gate.js \
+            "test-results/base-${{ matrix.plan }}/results.json" \
+            "test-results/head-${{ matrix.plan }}/results.json" \
+            "${{ matrix.plan }}" "Linux"
+
+      - name: Upload gate results
+        if: ${{ always() }}
+        uses: actions/upload-artifact@v4
+        with:
+          name: repro-gate-results-linux-${{ matrix.plan }}
+          path: test-results/
+          retention-days: 7
+
+  # ── Red→green gate (Windows) ────────────────────────────
+  repro-gate-windows:
+    name: Repro Gate Windows (${{ matrix.plan }})
+    needs: [ build-windows, build-base-windows, discover-plans ]
+    if: ${{ github.event_name == 'pull_request' && needs.discover-plans.outputs.has_repro_windows == 'true' }}
+    runs-on: windows-latest
+    timeout-minutes: 40
+    strategy:
+      fail-fast: false
+      matrix:
+        plan: ${{ fromJson(needs.discover-plans.outputs.repro_windows) }}
+
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Set up JDK 21
+        uses: actions/setup-java@v4
+        with:
+          java-version: '21'
+          distribution: 'temurin'
+
+      - name: Setup Node.js environment
+        uses: actions/setup-node@v4
+        with:
+          node-version: 20
+
+      - name: Setup autotest
+        run: npm install -g @vscjava/vscode-autotest
+
+      - name: Download head VSIX (fixed)
+        uses: actions/download-artifact@v4
+        with:
+          name: vsix-windows
+          path: .
+
+      - name: Download base VSIX (un-fixed)
+        uses: actions/download-artifact@v4
+        with:
+          name: vsix-base-windows
+          path: .
+
+      - name: Run repro plan on base then head
+        shell: pwsh
+        run: |
+          $head = "$((Get-Location).Path)\vscode-java-dependency.vsix"
+          $base = "$((Get-Location).Path)\vscode-java-dependency-base.vsix"
+          autotest run "test/e2e-plans/${{ matrix.plan }}.yaml" --vsix "$base" --no-llm --output "test-results\base-${{ matrix.plan }}"
+          Write-Host "base autotest exit: $LASTEXITCODE"
+          autotest run "test/e2e-plans/${{ matrix.plan }}.yaml" --vsix "$head" --no-llm --output "test-results\head-${{ matrix.plan }}"
+          Write-Host "head autotest exit: $LASTEXITCODE"
+          # Do not fail the step on a non-zero autotest exit — the gate judges.
+          exit 0
+
+      - name: Judge red→green
+        shell: pwsh
+        run: |
+          node .github/scripts/repro-gate.js "test-results\base-${{ matrix.plan }}\results.json" "test-results\head-${{ matrix.plan }}\results.json" "${{ matrix.plan }}" "Windows"
+
+      - name: Upload gate results
+        if: ${{ always() }}
+        uses: actions/upload-artifact@v4
+        with:
+          name: repro-gate-results-windows-${{ matrix.plan }}
+          path: test-results/
+          retention-days: 7
+
   # ── Unified analysis across both OSes ───────────────────
   analyze:
     name: E2E Summary
diff --git a/.gitignore b/.gitignore
index 3ecbdbef..408afee3 100644
--- a/.gitignore
+++ b/.gitignore
@@ -15,4 +15,5 @@ dist
 **/.project
 **/.checkstyle
 test-resources/
+test-results/
 **/.gradle