From 67c6598199fb76a91bf9aebf9431a067ac70ec5e Mon Sep 17 00:00:00 2001
From: wenytang-ms <wenyutang@microsoft.com>
Date: Thu, 2 Jul 2026 13:32:51 +0800
Subject: [PATCH 01/10] Add Copilot bug-reproduction agent workflow

Enable the Copilot coding agent to reproduce reported bugs when an issue is assigned to it:
- copilot-setup-steps.yml: preinstall JDK 21, Node 20, AutoTest, Xvfb/GTK, and a baseline VSIX so the agent can build and run UI tests without a trial-and-error dependency hunt
- repro skill: decide UI vs non-UI reproduction, pull in the reporter's project, distill a minimal committed fixture, reproduce (red), fix, prove (green), and report back
- copilot-instructions.md: add a Bug reproduction section routing assigned issues to the repro skill
- bug_report issue form + config: collect a minimal reproducible project and structured repro info for the agent
---
 .github/ISSUE_TEMPLATE/bug_report.yml     | 66 ++++++++++++++++++
 .github/ISSUE_TEMPLATE/config.yml         |  5 ++
 .github/copilot-instructions.md           |  8 +++
 .github/skills/repro/SKILL.md             | 81 +++++++++++++++++++++++
 .github/workflows/copilot-setup-steps.yml | 69 +++++++++++++++++++
 5 files changed, 229 insertions(+)
 create mode 100644 .github/ISSUE_TEMPLATE/bug_report.yml
 create mode 100644 .github/ISSUE_TEMPLATE/config.yml
 create mode 100644 .github/skills/repro/SKILL.md
 create mode 100644 .github/workflows/copilot-setup-steps.yml
diff --git a/.github/ISSUE_TEMPLATE/bug_report.yml b/.github/ISSUE_TEMPLATE/bug_report.yml
new file mode 100644
index 00000000..a8a7daec
--- /dev/null
+++ b/.github/ISSUE_TEMPLATE/bug_report.yml
@@ -0,0 +1,66 @@
+name: 🐛 Bug report
+description: Report a bug in Project Manager for Java, with a reproducible project so it can be confirmed automatically.
+labels: ["bug"]
+body:
+  - type: markdown
+    attributes:
+      value: |
+        Thanks for filing a bug! A **minimal reproducible project** lets a maintainer (or the Copilot agent) reproduce the issue with an AutoTest UI test and turn it into a regression test. The more precise the repro, the faster the fix.
+
+  - type: textarea
+    id: description
+    attributes:
+      label: Describe the bug
+      description: A clear and concise description of what the bug is.
+    validations:
+      required: true
+
+  - type: input
+    id: repro-project
+    attributes:
+      label: Reproducible project
+      description: A link to a public GitHub repo, or note that you attached a zip below. Prefer the smallest project that still shows the bug.
+      placeholder: "https://github.com/<you>/<minimal-repro>"
+    validations:
+      required: true
+
+  - type: textarea
+    id: steps
+    attributes:
+      label: Steps to reproduce
+      description: Exact steps against the project above. Name the affected surface (Java Projects tree, a context-menu / command, Referenced Libraries, export jar, new type, etc.).
+      placeholder: |
+        1. Open the project above in VS Code
+        2. Focus the Java Projects view
+        3. Expand src/main/java > com.example
+        4. Right-click App.java > ...
+    validations:
+      required: true
+
+  - type: textarea
+    id: expected
+    attributes:
+      label: Expected behavior
+    validations:
+      required: true
+
+  - type: textarea
+    id: actual
+    attributes:
+      label: Actual behavior
+      description: What happened instead. Screenshots and the "Java" / "Language Support for Java" output-channel logs help a lot.
+    validations:
+      required: true
+
+  - type: textarea
+    id: versions
+    attributes:
+      label: Environment
+      description: Fill in the versions you are running.
+      value: |
+        - OS:
+        - VS Code version:
+        - Extension Pack for Java / Project Manager for Java version:
+        - JDK version:
+    validations:
+      required: true
diff --git a/.github/ISSUE_TEMPLATE/config.yml b/.github/ISSUE_TEMPLATE/config.yml
new file mode 100644
index 00000000..271dd95f
--- /dev/null
+++ b/.github/ISSUE_TEMPLATE/config.yml
@@ -0,0 +1,5 @@
+blank_issues_enabled: true
+contact_links:
+  - name: 💬 Questions & discussions
+    url: https://github.com/microsoft/vscode-java-dependency/discussions
+    about: Ask usage questions or start a discussion instead of filing a bug.
diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md
index 451981cf..7a497d21 100644
--- a/.github/copilot-instructions.md
+++ b/.github/copilot-instructions.md
@@ -1,5 +1,13 @@
 # Copilot instructions for vscode-java-dependency
 
+## Bug reproduction
+
+- When an issue is assigned to Copilot, or you are asked to reproduce or confirm a reported bug, use the `repro` skill.
+- First decide whether the bug needs a UI/E2E test. Use an AutoTest plan (`uitest` skill) for user-facing surfaces (Java Projects tree, context menus, commands, classpath, export jar, view modes). Use a `test/maven-suite` unit test or a `jdtls.ext` test for pure logic, backend, or build/packaging bugs.
+- Reproduce with the reporter's project: clone the linked repo as a sibling or recreate the zip/inline sources, then distill it to a **minimal committed fixture**. Do not commit whole user projects or large binaries.
+- Author the reproduction so it fails on the current build and passes after the fix, and leave it committed as a regression test (a new `test/e2e-plans/repro-issue-<n>.yaml` is picked up by CI automatically).
+- If no reproducible project is provided and the bug is environment-specific, ask for one and label `needs-more-info` — do not fabricate a fix for an unreproduced bug.
+
 ## UI and E2E tests
 
 - When asked to add, update, run, or debug UI/E2E coverage, prefer the AutoTest YAML workflow under `test/e2e-plans/`.
diff --git a/.github/skills/repro/SKILL.md b/.github/skills/repro/SKILL.md
new file mode 100644
index 00000000..01cff921
--- /dev/null
+++ b/.github/skills/repro/SKILL.md
@@ -0,0 +1,81 @@
+---
+name: repro
+description: Reproduce a reported vscode-java-dependency (Project Manager for Java) bug from a GitHub issue, using the reporter's project. Decide whether a UI/E2E test is needed, reproduce with AutoTest when it is, and leave a committed regression test. Use when an issue is assigned to Copilot, when asked to reproduce/confirm a bug, or when triaging a "needs-repro" report.
+---
+
+# Reproduce a reported bug
+
+Use this skill when an issue is assigned to Copilot (or you are asked to reproduce/confirm a report) for `vscode-java-dependency` (Project Manager for Java).
+
+Goal: turn a bug report into a **deterministic, committed reproduction** that fails before the fix and passes after it. Prefer the smallest reproduction that proves the bug. Not every bug needs a UI test — decide first.
+
+## 1. Extract the report
+
+From the issue body (and the `bug_report` template fields) collect:
+
+- **Repro project** — a public GitHub repo link, an attached zip, or an inline `pom.xml` / `build.gradle` + sources. If none is provided and the bug is environment-specific, ask for one and label the issue `needs-more-info` instead of guessing.
+- **Steps to reproduce**, **expected** vs **actual** behavior, and the affected surface (tree view, context menu, command id, classpath, export jar, project creation, etc.).
+- **Versions** — VS Code, Extension Pack for Java, JDK, OS.
+
+## 2. Decide: does this need a UI/E2E test?
+
+**Use a UI/E2E AutoTest plan (`uitest` skill) when the bug is in the user-facing surface**, e.g.:
+
+- Java Projects tree rendering, ordering, labels, icons, or node presence/absence.
+- Context-menu / inline title actions, command palette entries, view focus/reveal.
+- Referenced Libraries / classpath UI (`../invisible` project), export jar, new type creation, link-with-editor, view modes.
+
+**Do NOT use a UI test — reproduce with a unit test or code analysis — when the bug is:**
+
+- Pure logic / data structures reachable from the extension API → add or extend a `test/maven-suite` test.
+- In the Java OSGi backend (`jdtls.ext/**`) → reproduce with a `jdtls.ext` JUnit test or by inspecting the LSP delegate command handler.
+- Build scripts, packaging, activation events, `package.json` contributions, or documentation → reproduce by reading/running the relevant script; no VS Code launch needed.
+
+When unsure, prefer the cheaper non-UI reproduction first; escalate to a UI test only if the behavior cannot be observed without the running view.
+
+## 3. Bring in the reporter's project
+
+Keep the committed footprint small and CI-reproducible:
+
+- **Public repo**: clone it as a sibling at runtime and point the plan's `workspace` at it while iterating locally:
+
+  ```powershell
+  git clone --depth 1 <repo-url> ..\repro-issue-<n>
+  ```
+
+- **Zip / inline**: recreate the project under `test\e2e-fixtures\issue-<n>\` (or reuse `test/maven` / `test/invisible` if the existing fixtures already trigger the bug).
+- Once reproduced, **distill it to the minimal fixture** that still fails and commit that (not the whole user project) so the regression test runs in CI without external clones or large binaries.
+
+## 4. Reproduce
+
+**UI path** — create `test/e2e-plans/repro-issue-<n>.yaml` following the `uitest` skill and `.github/instructions/uitest-plan.instructions.md`:
+
+```powershell
+npx -y @vscjava/vscode-autotest validate test\e2e-plans\repro-issue-<n>.yaml
+npm run build-server
+npx @vscode/vsce package -o vscode-java-dependency.vsix
+npx -y @vscjava/vscode-autotest run test\e2e-plans\repro-issue-<n>.yaml --vsix vscode-java-dependency.vsix --no-llm --output test-results\repro-issue-<n>
+```
+
+Author the plan so its deterministic verifier (`verifyTreeItem` / `verifyFile` / `verifyEditorTab` / `verifyClipboard`) asserts the **expected** behavior — it therefore **fails on the current (buggy) build**, capturing the bug. Inspect `test-results/repro-issue-<n>/results.json` and screenshots to confirm the failure matches the report.
+
+**Non-UI path** — add the failing `test/maven-suite` or `jdtls.ext` test and run the existing suite (`npm test`, or the `jdtls.ext` Maven test) to confirm it fails.
+
+## 5. Fix, then prove it
+
+1. Fix the product code (`src/**` for TS, `jdtls.ext/**` for the OSGi backend).
+2. **Rebuild and repackage the VSIX** (`npm run build-server` + `vsce package`) before rerunning any UI plan — never rerun against a stale VSIX.
+3. Rerun the reproduction; the same plan/test must now pass (red → green).
+4. Leave the reproduction committed as a permanent regression test. `.github/workflows/e2eUI.yml` discovers `test/e2e-plans/*.yaml` automatically, so `repro-issue-<n>.yaml` becomes its own CI check with no workflow edits.
+
+## 6. Report back
+
+- **Reproduced + fixed**: open a PR citing the failing step / screenshot / `results.json` reason as evidence, and note that the committed reproduction now passes. Reference the issue.
+- **Reproduced, report only**: comment with the reproduction (plan or test), the observed vs expected behavior, and the exact failing step.
+- **Could not reproduce**: comment with what you tried and precisely what is missing; label `needs-more-info`. Do not fabricate a fix for an unreproduced bug.
+
+## Environment notes
+
+- The Copilot coding agent environment is prepared by `.github/workflows/copilot-setup-steps.yml` (JDK 21, Node 20, AutoTest, Xvfb, a baseline VSIX). Assume these are present.
+- AutoTest downloads VS Code and installs `vscjava.vscode-java-pack` at run time. If those network hosts are blocked by the agent firewall, UI reproduction cannot launch — fall back to the non-UI path and note the limitation, or ask a maintainer to allow the VS Code download + Marketplace hosts.
+- Always run AutoTest with `--no-llm` in the agent so pass/fail comes only from deterministic verifiers.
diff --git a/.github/workflows/copilot-setup-steps.yml b/.github/workflows/copilot-setup-steps.yml
new file mode 100644
index 00000000..1aa29e3e
--- /dev/null
+++ b/.github/workflows/copilot-setup-steps.yml
@@ -0,0 +1,69 @@
+name: "Copilot Setup Steps"
+
+# Prepares the GitHub Copilot coding agent's ephemeral environment so it can
+# build this extension and reproduce bugs with AutoTest UI/E2E plans without a
+# slow trial-and-error dependency hunt.
+#
+# The `copilot-setup-steps` job runs BEFORE Copilot starts working. It mirrors
+# the Linux path of `.github/workflows/e2eUI.yml`: JDK 21 + Node 20, the OSGi
+# bundle build (which also warms the Maven cache), the AutoTest CLI, and the
+# Xvfb / GTK libraries required to launch VS Code headless.
+#
+# NOTE: This workflow only takes effect once it is on the default branch.
+# Reproducing UI tests additionally needs the Copilot firewall to allow the
+# VS Code download + Marketplace hosts — see .github/skills/repro/SKILL.md.
+
+on:
+  workflow_dispatch:
+  push:
+    paths:
+      - .github/workflows/copilot-setup-steps.yml
+  pull_request:
+    paths:
+      - .github/workflows/copilot-setup-steps.yml
+
+jobs:
+  # The job MUST be called `copilot-setup-steps` or Copilot will not pick it up.
+  copilot-setup-steps:
+    runs-on: ubuntu-latest
+    timeout-minutes: 30
+
+    # Lowest permissions needed for setup. Copilot gets its own token afterwards.
+    permissions:
+      contents: read
+
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@v4
+
+      - name: Set up JDK 21
+        uses: actions/setup-java@v4
+        with:
+          java-version: "21"
+          distribution: "temurin"
+
+      - name: Set up Node.js
+        uses: actions/setup-node@v4
+        with:
+          node-version: "20"
+          cache: "npm"
+
+      - name: Install graphics libraries for headless VS Code
+        run: |
+          sudo apt-get update
+          sudo apt-get install -y libxkbfile-dev pkg-config libsecret-1-dev libxss1 dbus xvfb libgtk-3-0 libgbm1
+
+      - name: Install Node.js modules
+        run: npm install
+
+      - name: Install global build & test tooling
+        run: npm install -g @vscode/vsce @vscjava/vscode-autotest
+
+      - name: Build OSGi bundle (warms the Maven cache)
+        run: npm run build-server
+
+      - name: Package a baseline VSIX
+        # Produces vscode-java-dependency.vsix so the first AutoTest run does not
+        # pay the full build cost. Copilot must repackage after editing src/** or
+        # jdtls.ext/** before rerunning a plan against a stale VSIX.
+        run: vsce package -o vscode-java-dependency.vsix

From 40b14ad4e72d66ab4ed787f98581af41c8d11465 Mon Sep 17 00:00:00 2001
From: wenytang-ms <wenyutang@microsoft.com>
Date: Thu, 2 Jul 2026 14:30:45 +0800
Subject: [PATCH 02/10] Pre-download VS Code + Java pack in setup-steps; refine
 repro guidance

Add .github/scripts/prewarm-vscode.js and a copilot-setup-steps step that warms AutoTest's <repo>/.vscode-test cache (VS Code stable + vscjava.vscode-java-pack) before the agent firewall engages, so firewalled UI reproductions launch offline.

Refine repro/uitest guidance: separate reproduction from fix-proof (UI test's key value is red->green screenshots), require verifiers only on the decisive assertion step, and make PRs state repro method + execution status.
---
 .github/copilot-instructions.md               |  2 +-
 .../instructions/uitest-plan.instructions.md  |  5 +-
 .github/scripts/prewarm-vscode.js             | 95 +++++++++++++++++++
 .github/skills/repro/SKILL.md                 | 18 +++-
 .github/workflows/copilot-setup-steps.yml     | 15 ++-
 5 files changed, 126 insertions(+), 9 deletions(-)
 create mode 100644 .github/scripts/prewarm-vscode.js

diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md
index 7a497d21..162338e7 100644
--- a/.github/copilot-instructions.md
+++ b/.github/copilot-instructions.md
@@ -13,4 +13,4 @@
 - When asked to add, update, run, or debug UI/E2E coverage, prefer the AutoTest YAML workflow under `test/e2e-plans/`.
 - Use the `uitest` skill for UI test work. It should create or update `test/e2e-plans/*.yaml`, validate the plan, build the OSGi bundle and package the extension when needed, run AutoTest, and inspect `test-results/`.
 - Do not create legacy VS Code extension tests (`test/maven-suite`, `test/gui`) for UI coverage unless the user explicitly asks for that format.
-- Prefer deterministic AutoTest verifiers (`verifyTreeItem`, `verifyFile`, `verifyEditorTab`, `verifyClipboard`) over screenshot-only checks.
+- Prefer deterministic AutoTest verifiers (`verifyTreeItem`, `verifyFile`, `verifyEditorTab`, `verifyClipboard`) on the decisive assertion step; you do not need a verifier on every step. Use AutoTest screenshots to prove a fix (a red run before, a green run after) — but never as the sole pass/fail authority for the decisive assertion.
diff --git a/.github/instructions/uitest-plan.instructions.md b/.github/instructions/uitest-plan.instructions.md
index 34cc6708..d2e2dbeb 100644
--- a/.github/instructions/uitest-plan.instructions.md
+++ b/.github/instructions/uitest-plan.instructions.md
@@ -31,12 +31,13 @@ action: 'clickViewTitleAction "Java Projects" "Unlink with Editor"'
 
 ## Verification rules
 
-- Add deterministic verification to every meaningful step. The natural-language `verify` field is context for humans and failure analysis; it is not pass/fail authority by itself, and it is auto-passed when a plan runs with `--no-llm`.
+- You do **not** need a verifier on every step. Author the *actions* step-by-step, but gate pass/fail with a deterministic verifier only on the **decisive assertion step(s)** — the step that captures the reported bug — plus any step prone to a silent no-op (see the `expandTreeItem` / free-form action caveat above). Intermediate action steps can rely on AutoTest screenshots instead of their own verifier.
+- The natural-language `verify` field is context for humans and failure analysis; it is not pass/fail authority by itself, and it is auto-passed when a plan runs with `--no-llm`. So the decisive step **must** carry a deterministic verifier, or a `--no-llm` run is a false green.
 - Use `verifyTreeItem` (with `name:`, optional `exact: true`, and `visible: false` for absence) as the authoritative check for Java Projects tree state.
 - Use `verifyFile` after operations that create, modify, or delete files on disk (new type, export jar, permanent delete). VS Code can open duplicate editor tabs with stale buffers, so prefer file-content checks over editor checks after such operations.
 - Use `verifyEditorTab` to assert which file an action opened, and `verifyClipboard` for copy-path commands.
 - On state-check steps whose only assertion is a deterministic verifier, omit the `verify:` field to avoid false LLM failures.
-- Use screenshots only as diagnostics produced by AutoTest; do not make screenshots the only evidence of pass/fail.
+- Screenshots are AutoTest's evidence that an action ran and are the primary artifact for **proving a fix** (a red run before, a green run after). Do not make a screenshot the sole pass/fail authority for the decisive assertion — pair it with a deterministic verifier.
 
 ## Local validation commands
 
diff --git a/.github/scripts/prewarm-vscode.js b/.github/scripts/prewarm-vscode.js
new file mode 100644
index 00000000..d077823d
--- /dev/null
+++ b/.github/scripts/prewarm-vscode.js
@@ -0,0 +1,95 @@
+#!/usr/bin/env node
+/*
+ * Pre-download VS Code + the Java extensions into AutoTest's cache BEFORE the
+ * Copilot coding agent's firewall is enabled.
+ *
+ * AutoTest (`@vscjava/vscode-autotest`) launches VS Code via `@vscode/test-electron`:
+ *   1. downloadAndUnzipVSCode(version)            -> <cwd>/.vscode-test/vscode-<...>
+ *   2. resolveCliArgsFromVSCodeExecutablePath()   -> --extensions-dir=<cwd>/.vscode-test/extensions
+ *   3. code --install-extension <id> --force      -> pulls Marketplace bits into that extensions dir
+ *
+ * The VS Code CDN (update.code.visualstudio.com) and the Marketplace are NOT on the
+ * Copilot agent's default firewall allowlist, so those network calls fail at run time.
+ * This script performs the exact same three operations during `copilot-setup-steps`
+ * (which runs before the firewall), so the caches are warm and the firewalled UI run
+ * hits them offline.
+ *
+ * Because `@vscode/test-electron` derives its cache from `process.cwd()`, this MUST run
+ * from the repository root — the same directory AutoTest runs from at agent time.
+ *
+ * Env overrides:
+ *   VSCODE_VERSION       VS Code channel/version to warm (default: "stable")
+ *   PREWARM_EXTENSIONS   comma-separated extension ids (default: "vscjava.vscode-java-pack")
+ */
+"use strict";
+
+const path = require("path");
+const cp = require("child_process");
+
+function resolveTestElectron() {
+  // Prefer the exact copy that the globally installed AutoTest uses, so the
+  // version and default-cache-path logic match the agent run byte-for-byte.
+  const candidates = [];
+  try {
+    const globalRoot = cp.execSync("npm root -g", { encoding: "utf-8" }).trim();
+    candidates.push(path.join(globalRoot, "@vscjava", "vscode-autotest"));
+    candidates.push(globalRoot);
+  } catch {
+    /* npm not on PATH — fall back to local resolution below */
+  }
+  candidates.push(process.cwd());
+  try {
+    const entry = require.resolve("@vscode/test-electron", { paths: candidates });
+    return require(entry);
+  } catch {
+    // Last resort: a plain require (works if it is a local dependency).
+    return require("@vscode/test-electron");
+  }
+}
+
+async function main() {
+  const version = process.env.VSCODE_VERSION || "stable";
+  const extensions = (process.env.PREWARM_EXTENSIONS || "vscjava.vscode-java-pack")
+    .split(",")
+    .map((s) => s.trim())
+    .filter(Boolean);
+
+  const { downloadAndUnzipVSCode, resolveCliArgsFromVSCodeExecutablePath } = resolveTestElectron();
+
+  console.log(`⬇️  Pre-downloading VS Code "${version}" into ${path.join(process.cwd(), ".vscode-test")} ...`);
+  const vscodePath = await downloadAndUnzipVSCode(version);
+  console.log(`✅ VS Code ready: ${vscodePath}`);
+
+  const [cli, ...baseArgs] = resolveCliArgsFromVSCodeExecutablePath(vscodePath);
+  const extensionsDir = baseArgs.find((a) => a.startsWith("--extensions-dir="))?.split("=")[1];
+  console.log(`📁 Extensions dir: ${extensionsDir ?? "(default)"}`);
+
+  let failures = 0;
+  for (const ext of extensions) {
+    console.log(`📦 Installing ${ext} (+ Extension Pack members) ...`);
+    try {
+      cp.execFileSync(cli, [...baseArgs, "--install-extension", ext, "--force"], {
+        stdio: "inherit",
+        timeout: 300_000,
+        env: { ...process.env },
+        shell: process.platform === "win32",
+      });
+      console.log(`✅ Installed ${ext}`);
+    } catch (e) {
+      failures++;
+      console.warn(`⚠️  Failed to install ${ext}: ${e.message}`);
+    }
+  }
+
+  if (failures > 0) {
+    // Non-fatal: a missing extension only degrades UI reproduction, and the agent
+    // can still fall back to the non-UI path. Surface it without aborting setup.
+    console.warn(`⚠️  ${failures} extension(s) failed to pre-install; UI reproduction may be degraded.`);
+  }
+  console.log("🎉 VS Code + Java extensions pre-warmed for AutoTest.");
+}
+
+main().catch((err) => {
+  console.error("❌ Pre-warm failed:", err);
+  process.exit(1);
+});
diff --git a/.github/skills/repro/SKILL.md b/.github/skills/repro/SKILL.md
index 01cff921..d1782830 100644
--- a/.github/skills/repro/SKILL.md
+++ b/.github/skills/repro/SKILL.md
@@ -19,6 +19,11 @@ From the issue body (and the `bug_report` template fields) collect:
 
 ## 2. Decide: does this need a UI/E2E test?
 
+The reproduction and the fix-proof are two different questions — decide each:
+
+- **Reproduction** can often be non-UI or even a code read, especially for simple, obvious bugs. Prefer the cheapest reproduction that captures the report.
+- **Fix-proof** is where a UI/E2E test earns its cost: a red run before the fix and a green run after, with screenshots, is the strongest evidence for a user-facing bug. If the bug is user-facing, favour leaving a committed UI plan even when you first reproduced it another way.
+
 **Use a UI/E2E AutoTest plan (`uitest` skill) when the bug is in the user-facing surface**, e.g.:
 
 - Java Projects tree rendering, ordering, labels, icons, or node presence/absence.
@@ -57,7 +62,7 @@ npx @vscode/vsce package -o vscode-java-dependency.vsix
 npx -y @vscjava/vscode-autotest run test\e2e-plans\repro-issue-<n>.yaml --vsix vscode-java-dependency.vsix --no-llm --output test-results\repro-issue-<n>
 ```
 
-Author the plan so its deterministic verifier (`verifyTreeItem` / `verifyFile` / `verifyEditorTab` / `verifyClipboard`) asserts the **expected** behavior — it therefore **fails on the current (buggy) build**, capturing the bug. Inspect `test-results/repro-issue-<n>/results.json` and screenshots to confirm the failure matches the report.
+Author the plan step-by-step for the **actions**, but you do not need a verifier on every step — put a deterministic verifier (`verifyTreeItem` / `verifyFile` / `verifyEditorTab` / `verifyClipboard`) on the **decisive assertion step** (the one that captures the bug) and on any step prone to a silent no-op. That decisive verifier must assert the **expected** behavior, so it **fails on the current (buggy) build**. Inspect `test-results/repro-issue-<n>/results.json` and the screenshots to confirm the failure matches the report, and keep the red-run screenshot as before-fix evidence.
 
 **Non-UI path** — add the failing `test/maven-suite` or `jdtls.ext` test and run the existing suite (`npm test`, or the `jdtls.ext` Maven test) to confirm it fails.
 
@@ -66,16 +71,21 @@ Author the plan so its deterministic verifier (`verifyTreeItem` / `verifyFile` /
 1. Fix the product code (`src/**` for TS, `jdtls.ext/**` for the OSGi backend).
 2. **Rebuild and repackage the VSIX** (`npm run build-server` + `vsce package`) before rerunning any UI plan — never rerun against a stale VSIX.
 3. Rerun the reproduction; the same plan/test must now pass (red → green).
-4. Leave the reproduction committed as a permanent regression test. `.github/workflows/e2eUI.yml` discovers `test/e2e-plans/*.yaml` automatically, so `repro-issue-<n>.yaml` becomes its own CI check with no workflow edits.
+4. Keep both runs' evidence: the **before** (red) and **after** (green) screenshots plus the `results.json` reason. The green screenshot is the primary proof that the fix works — attach it (and the before/after pair) to the PR.
+5. Leave the reproduction committed as a permanent regression test. `.github/workflows/e2eUI.yml` discovers `test/e2e-plans/*.yaml` automatically, so `repro-issue-<n>.yaml` becomes its own CI check with no workflow edits.
 
 ## 6. Report back
 
-- **Reproduced + fixed**: open a PR citing the failing step / screenshot / `results.json` reason as evidence, and note that the committed reproduction now passes. Reference the issue.
+Every PR or comment must state **how you reproduced** (UI plan vs unit test vs code read) and the **execution status** (ran red→green with screenshots attached, or could not execute — e.g. the UI run was blocked — and why).
+
+- **Reproduced + fixed**: open a PR that attaches the before (red) and after (green) screenshots as the fix-proof, cites the failing step / `results.json` reason, and notes the committed reproduction now passes. Reference the issue.
 - **Reproduced, report only**: comment with the reproduction (plan or test), the observed vs expected behavior, and the exact failing step.
+- **Reproduced but could not run the UI test** (e.g. VS Code download / Marketplace blocked): commit the plan, explain what fails and why it could not execute, and either fall back to a non-UI proof or ask a maintainer to unblock — do not claim a green run you did not observe.
 - **Could not reproduce**: comment with what you tried and precisely what is missing; label `needs-more-info`. Do not fabricate a fix for an unreproduced bug.
 
 ## Environment notes
 
 - The Copilot coding agent environment is prepared by `.github/workflows/copilot-setup-steps.yml` (JDK 21, Node 20, AutoTest, Xvfb, a baseline VSIX). Assume these are present.
-- AutoTest downloads VS Code and installs `vscjava.vscode-java-pack` at run time. If those network hosts are blocked by the agent firewall, UI reproduction cannot launch — fall back to the non-UI path and note the limitation, or ask a maintainer to allow the VS Code download + Marketplace hosts.
+- That setup runs **before the agent firewall**, and its final step pre-downloads VS Code (stable) and the `vscjava.vscode-java-pack` extensions into AutoTest's `<repo>/.vscode-test` cache (via `.github/scripts/prewarm-vscode.js`). So the firewalled UI run should launch offline from that warm cache — you normally do **not** need to fetch VS Code or Marketplace bits yourself.
+- If the pre-warm did not run (e.g. an older branch) or the cache is cold, AutoTest will try to download VS Code + install `vscjava.vscode-java-pack` at run time. Those hosts (VS Code CDN + Marketplace) are firewall-blocked by default — if that happens, fall back to the non-UI path and note the limitation, or ask a maintainer to allow those hosts.
 - Always run AutoTest with `--no-llm` in the agent so pass/fail comes only from deterministic verifiers.
diff --git a/.github/workflows/copilot-setup-steps.yml b/.github/workflows/copilot-setup-steps.yml
index 1aa29e3e..8fab9cc4 100644
--- a/.github/workflows/copilot-setup-steps.yml
+++ b/.github/workflows/copilot-setup-steps.yml
@@ -10,8 +10,12 @@ name: "Copilot Setup Steps"
 # Xvfb / GTK libraries required to launch VS Code headless.
 #
 # NOTE: This workflow only takes effect once it is on the default branch.
-# Reproducing UI tests additionally needs the Copilot firewall to allow the
-# VS Code download + Marketplace hosts — see .github/skills/repro/SKILL.md.
+#
+# Setup steps run BEFORE the Copilot agent firewall is enabled, so the final
+# step pre-downloads VS Code (stable) and the Java Extension Pack into AutoTest's
+# `<repo>/.vscode-test` cache. That warms the exact files AutoTest fetches from
+# the VS Code CDN + Marketplace at run time — hosts the firewall blocks — so the
+# firewalled UI reproduction launches offline. See .github/skills/repro/SKILL.md.
 
 on:
   workflow_dispatch:
@@ -67,3 +71,10 @@ jobs:
         # pay the full build cost. Copilot must repackage after editing src/** or
         # jdtls.ext/** before rerunning a plan against a stale VSIX.
         run: vsce package -o vscode-java-dependency.vsix
+
+      - name: Pre-download VS Code and Java extensions (before firewall)
+        # Warms <repo>/.vscode-test so the firewalled agent run does not need the
+        # VS Code CDN or Marketplace. Best-effort: a failure here only degrades UI
+        # reproduction, so it must not block Copilot from starting work.
+        continue-on-error: true
+        run: node .github/scripts/prewarm-vscode.js

From 2a9fe72c75e74fa7546222f512c28693807d59a6 Mon Sep 17 00:00:00 2001
From: wenytang-ms <wenyutang@microsoft.com>
Date: Thu, 2 Jul 2026 15:17:43 +0800
Subject: [PATCH 03/10] Clarify latest-VS-Code offline flow and CI screenshot
 evidence for repro

Keep vscodeVersion stable (=latest, no pinning): document that the run-time (dns block) on update.code.visualstudio.com is expected/non-fatal because @vscode/test-electron falls back to the pre-warmed cached build, so the agent must not abandon the UI path.

Point repro/uitest evidence at CI: e2eUI.yml already uploads full test-results (screenshots + results.json) as e2e-results-<os>-<plan> artifacts, so no manual screenshot attaching is needed. Note optional firewall allowlist for a fully clean run.
---
 .github/copilot-instructions.md                  |  3 ++-
 .github/instructions/uitest-plan.instructions.md |  2 +-
 .github/skills/repro/SKILL.md                    | 15 +++++++++------
 .github/workflows/e2eUI.yml                      |  5 +++++
 4 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md
index 162338e7..8f38266f 100644
--- a/.github/copilot-instructions.md
+++ b/.github/copilot-instructions.md
@@ -13,4 +13,5 @@
 - When asked to add, update, run, or debug UI/E2E coverage, prefer the AutoTest YAML workflow under `test/e2e-plans/`.
 - Use the `uitest` skill for UI test work. It should create or update `test/e2e-plans/*.yaml`, validate the plan, build the OSGi bundle and package the extension when needed, run AutoTest, and inspect `test-results/`.
 - Do not create legacy VS Code extension tests (`test/maven-suite`, `test/gui`) for UI coverage unless the user explicitly asks for that format.
-- Prefer deterministic AutoTest verifiers (`verifyTreeItem`, `verifyFile`, `verifyEditorTab`, `verifyClipboard`) on the decisive assertion step; you do not need a verifier on every step. Use AutoTest screenshots to prove a fix (a red run before, a green run after) — but never as the sole pass/fail authority for the decisive assertion.
+- Prefer deterministic AutoTest verifiers (`verifyTreeItem`, `verifyFile`, `verifyEditorTab`, `verifyClipboard`) on the decisive assertion step; you do not need a verifier on every step. Screenshots prove a fix (a red run before, a green run after) — but never as the sole pass/fail authority for the decisive assertion.
+- Do not attach screenshots by hand: `.github/workflows/e2eUI.yml` runs each `test/e2e-plans/*.yaml` on Linux + Windows and uploads the full `test-results/` (screenshots + `results.json`) as `e2e-results-<os>-<plan>` artifacts. Reference those artifacts as the fix-proof in a PR.
diff --git a/.github/instructions/uitest-plan.instructions.md b/.github/instructions/uitest-plan.instructions.md
index d2e2dbeb..f76b9b5f 100644
--- a/.github/instructions/uitest-plan.instructions.md
+++ b/.github/instructions/uitest-plan.instructions.md
@@ -9,7 +9,7 @@ Test plans under `test/e2e-plans/` are executable YAML files consumed by `@vscja
 
 ## Setup rules
 
-- Use `setup.extension: "vscjava.vscode-java-pack"` plus `setup.vscodeVersion: "stable"` for most scenarios. Installing the Extension Pack for Java pulls in every Java extension the Java Projects view relies on, so there is no need to install `redhat.java` separately.
+- Use `setup.extension: "vscjava.vscode-java-pack"` plus `setup.vscodeVersion: "stable"` for most scenarios. Installing the Extension Pack for Java pulls in every Java extension the Java Projects view relies on, so there is no need to install `redhat.java` separately. Keep `stable` (always the latest release) — do **not** pin a concrete version. In the Copilot agent, `.github/workflows/copilot-setup-steps.yml` pre-downloads that latest build + the pack before the firewall, and `@vscode/test-electron` falls back to the cached build when the run-time version check is blocked, so the plan runs offline without pinning.
 - Install the extension under test from a local VSIX at runtime with `--vsix vscode-java-dependency.vsix` — do not rely on a marketplace copy of `vscjava.vscode-java-dependency`.
 - Use existing in-repo fixtures as the workspace: `../maven` (a `maven-archetype-quickstart` project: `my-app` / `com.mycompany.app` / `App.java`) or `../invisible` (an unmanaged-folder project for referenced-library scenarios). Paths are relative to the test plan file. Do not add large binary fixtures.
 - Referenced-library / classpath commands (`java.project.addLibraries`, `java.project.removeLibrary`, `java.project.addLibraryFolders`, `java.project.refreshLibraries`) only apply to invisible projects — use `../invisible`, not `../maven`, for those.
diff --git a/.github/skills/repro/SKILL.md b/.github/skills/repro/SKILL.md
index d1782830..57fa43e3 100644
--- a/.github/skills/repro/SKILL.md
+++ b/.github/skills/repro/SKILL.md
@@ -71,21 +71,24 @@ Author the plan step-by-step for the **actions**, but you do not need a verifier
 1. Fix the product code (`src/**` for TS, `jdtls.ext/**` for the OSGi backend).
 2. **Rebuild and repackage the VSIX** (`npm run build-server` + `vsce package`) before rerunning any UI plan — never rerun against a stale VSIX.
 3. Rerun the reproduction; the same plan/test must now pass (red → green).
-4. Keep both runs' evidence: the **before** (red) and **after** (green) screenshots plus the `results.json` reason. The green screenshot is the primary proof that the fix works — attach it (and the before/after pair) to the PR.
+4. Capture both runs' evidence: the **before** (red) and **after** (green) results. The green run is the primary proof the fix works. You do **not** need to attach images by hand — when the plan is on the PR, `.github/workflows/e2eUI.yml` re-runs it on Linux + Windows and uploads the full `test-results/` (screenshots + `results.json`) as `e2e-results-<os>-<plan>` artifacts. Link those in the PR and paste the `results.json` reason from your own red run.
 5. Leave the reproduction committed as a permanent regression test. `.github/workflows/e2eUI.yml` discovers `test/e2e-plans/*.yaml` automatically, so `repro-issue-<n>.yaml` becomes its own CI check with no workflow edits.
 
 ## 6. Report back
 
-Every PR or comment must state **how you reproduced** (UI plan vs unit test vs code read) and the **execution status** (ran red→green with screenshots attached, or could not execute — e.g. the UI run was blocked — and why).
+Every PR or comment must state **how you reproduced** (UI plan vs unit test vs code read) and the **execution status** (ran red→green, or could not execute — and why). Never claim a green run you did not observe.
 
-- **Reproduced + fixed**: open a PR that attaches the before (red) and after (green) screenshots as the fix-proof, cites the failing step / `results.json` reason, and notes the committed reproduction now passes. Reference the issue.
+- **Reproduced + fixed**: open a PR that links the CI `e2e-results-<os>-<plan>` artifacts as the fix-proof, cites the failing step / `results.json` reason from your red run, and notes the committed reproduction now passes. Reference the issue.
 - **Reproduced, report only**: comment with the reproduction (plan or test), the observed vs expected behavior, and the exact failing step.
-- **Reproduced but could not run the UI test** (e.g. VS Code download / Marketplace blocked): commit the plan, explain what fails and why it could not execute, and either fall back to a non-UI proof or ask a maintainer to unblock — do not claim a green run you did not observe.
+- **Reproduced but could not run the UI test**: remember a `(dns block)` on `update.code.visualstudio.com` is expected and non-fatal (see Environment notes) — it is **not** a reason to skip the UI path. Only if the editor genuinely never launches, commit the plan, explain the real failure, and fall back to a non-UI proof or ask a maintainer to unblock.
 - **Could not reproduce**: comment with what you tried and precisely what is missing; label `needs-more-info`. Do not fabricate a fix for an unreproduced bug.
 
 ## Environment notes
 
 - The Copilot coding agent environment is prepared by `.github/workflows/copilot-setup-steps.yml` (JDK 21, Node 20, AutoTest, Xvfb, a baseline VSIX). Assume these are present.
-- That setup runs **before the agent firewall**, and its final step pre-downloads VS Code (stable) and the `vscjava.vscode-java-pack` extensions into AutoTest's `<repo>/.vscode-test` cache (via `.github/scripts/prewarm-vscode.js`). So the firewalled UI run should launch offline from that warm cache — you normally do **not** need to fetch VS Code or Marketplace bits yourself.
-- If the pre-warm did not run (e.g. an older branch) or the cache is cold, AutoTest will try to download VS Code + install `vscjava.vscode-java-pack` at run time. Those hosts (VS Code CDN + Marketplace) are firewall-blocked by default — if that happens, fall back to the non-UI path and note the limitation, or ask a maintainer to allow those hosts.
+- That setup runs **before the agent firewall**, and its final step pre-downloads the **latest** VS Code (`stable`) and the `vscjava.vscode-java-pack` extensions into AutoTest's `<repo>/.vscode-test` cache (via `.github/scripts/prewarm-vscode.js`). Keep the plans on `vscodeVersion: "stable"` (do **not** pin a version) — `stable` always means the current latest release, and it is exactly what the pre-warm cached.
+- **A `(dns block)` on `update.code.visualstudio.com` at run time is EXPECTED and NON-FATAL — do not treat it as a UI-test failure or abandon the UI path.** AutoTest re-resolves `stable` over the network at launch; the firewall blocks that, but `@vscode/test-electron` catches it and **falls back to the already-cached latest VS Code**, and the Java extensions are already installed in `.vscode-test/extensions`. So the editor still launches offline. VS Code's own telemetry/Marketplace DNS calls are blocked too and are equally harmless.
+- Only if the pre-warm genuinely did not run (e.g. an older branch, or a cold `.vscode-test` with no cached build) will the UI run actually fail to launch. In that case fall back to the non-UI path and note the limitation.
+- **Screenshots / results are captured for you by CI, not by hand.** When the reproduction plan lands on a PR (base `main`), `.github/workflows/e2eUI.yml` runs it on Linux **and** Windows and uploads the whole `test-results/` directory (screenshots + `results.json`) as `e2e-results-<os>-<plan>` artifacts, plus an aggregate `summary.md`. In the PR body, link those artifacts as the fix-proof and paste the `results.json` failure reason from your own red run — you do not need to attach images manually.
+- Maintainer option: adding `update.code.visualstudio.com` to the Copilot coding-agent firewall allowlist (repo **Settings → Copilot → coding agent**, see https://gh.io/copilot/firewall-config) removes the version-resolution block entirely, so the run is clean and does not rely on the offline fallback. The pre-warm still makes the 276 MB binary + Marketplace pack a cache hit, so nothing large is re-fetched.
 - Always run AutoTest with `--no-llm` in the agent so pass/fail comes only from deterministic verifiers.
diff --git a/.github/workflows/e2eUI.yml b/.github/workflows/e2eUI.yml
index 96855ee7..b2c04972 100644
--- a/.github/workflows/e2eUI.yml
+++ b/.github/workflows/e2eUI.yml
@@ -23,6 +23,11 @@ on:
 # surfaces as its own PR check, so failures are visible without an
 # extra gate job.
 #
+# discover-plans globs test/e2e-plans/*.yaml, so a Copilot-authored
+# repro-issue-<n>.yaml is picked up automatically. Each plan's full
+# test-results/ (screenshots + results.json) is uploaded as the
+# e2e-results-<os>-<plan> artifact — the fix-proof for repro PRs.
+#
 # Inspired by vscode-java-pack/.github/workflows/e2e-autotest.yml.
 
 jobs:

From 48cd0ecf66218782cf649cddc66c146f2f9ebfe8 Mon Sep 17 00:00:00 2001
From: wenytang-ms <wenyutang@microsoft.com>
Date: Thu, 2 Jul 2026 16:05:23 +0800
Subject: [PATCH 04/10] ci: add red->green gate proving repro plans fail on
 base and pass on fix

For repro-issue-<n>.yaml plans, e2eUI now rebuilds the PR base (un-fixed)
into its own VSIX and runs the plan against base AND head in one CI run,
requiring base=RED (deterministic assertion fail) and head=GREEN. This
turns 'red->green' into a machine-checked invariant instead of prose in
the PR body, closing the gap where an agent asserts a fix works without
actually reproducing the bug.

- .github/scripts/repro-gate.js: judge that reads both results.json files,
  distinguishes a genuine assertion RED from an infra crash/error, and
  exits non-zero with a clear verdict (NOT_REPRODUCED / NOT_FIXED /
  INCONCLUSIVE) plus a job-summary table.
- e2eUI.yml: discover-plans splits regression vs repro matrices; adds
  build-base-{linux,windows} + repro-gate-{linux,windows} (PR-only, only
  when a repro plan exists). After merge to main the plan is demoted to an
  ordinary green regression.
- repro/SKILL.md, copilot-instructions.md: document the gate, red-first
  local loop, and single-PR (plan+fix) flow.
---
 .github/copilot-instructions.md |   4 +-
 .github/scripts/repro-gate.js   | 186 +++++++++++++++++++
 .github/skills/repro/SKILL.md   |  25 ++-
 .github/workflows/e2eUI.yml     | 310 ++++++++++++++++++++++++++++++--
 4 files changed, 503 insertions(+), 22 deletions(-)
 create mode 100644 .github/scripts/repro-gate.js

diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md
index 8f38266f..24de89ab 100644
--- a/.github/copilot-instructions.md
+++ b/.github/copilot-instructions.md
@@ -5,7 +5,7 @@
 - When an issue is assigned to Copilot, or you are asked to reproduce or confirm a reported bug, use the `repro` skill.
 - First decide whether the bug needs a UI/E2E test. Use an AutoTest plan (`uitest` skill) for user-facing surfaces (Java Projects tree, context menus, commands, classpath, export jar, view modes). Use a `test/maven-suite` unit test or a `jdtls.ext` test for pure logic, backend, or build/packaging bugs.
 - Reproduce with the reporter's project: clone the linked repo as a sibling or recreate the zip/inline sources, then distill it to a **minimal committed fixture**. Do not commit whole user projects or large binaries.
-- Author the reproduction so it fails on the current build and passes after the fix, and leave it committed as a regression test (a new `test/e2e-plans/repro-issue-<n>.yaml` is picked up by CI automatically).
+- Author the reproduction so it fails on the current build and passes after the fix, and leave it committed as a regression test (a new `test/e2e-plans/repro-issue-<n>.yaml` is picked up by CI automatically). Commit the repro plan and the fix **together in one PR** — CI's red→green gate rebuilds the PR base to prove the red, so you never push a knowingly-broken commit.
 - If no reproducible project is provided and the bug is environment-specific, ask for one and label `needs-more-info` — do not fabricate a fix for an unreproduced bug.
 
 ## UI and E2E tests
@@ -14,4 +14,4 @@
 - Use the `uitest` skill for UI test work. It should create or update `test/e2e-plans/*.yaml`, validate the plan, build the OSGi bundle and package the extension when needed, run AutoTest, and inspect `test-results/`.
 - Do not create legacy VS Code extension tests (`test/maven-suite`, `test/gui`) for UI coverage unless the user explicitly asks for that format.
 - Prefer deterministic AutoTest verifiers (`verifyTreeItem`, `verifyFile`, `verifyEditorTab`, `verifyClipboard`) on the decisive assertion step; you do not need a verifier on every step. Screenshots prove a fix (a red run before, a green run after) — but never as the sole pass/fail authority for the decisive assertion.
-- Do not attach screenshots by hand: `.github/workflows/e2eUI.yml` runs each `test/e2e-plans/*.yaml` on Linux + Windows and uploads the full `test-results/` (screenshots + `results.json`) as `e2e-results-<os>-<plan>` artifacts. Reference those artifacts as the fix-proof in a PR.
+- Do not attach screenshots by hand: `.github/workflows/e2eUI.yml` runs each `test/e2e-plans/*.yaml` on Linux + Windows and uploads the full `test-results/` (screenshots + `results.json`) as artifacts. For a `repro-issue-<n>.yaml`, a **red→green gate** additionally rebuilds the PR base (un-fixed) and runs the plan against base **and** head, requiring `base ❌ RED → head ✅ GREEN`; its verdict + `repro-gate-results-<os>-<plan>` artifacts are the authoritative fix-proof. Ordinary regression plans upload `e2e-results-<os>-<plan>`. Reference the relevant artifacts as the fix-proof in a PR.
diff --git a/.github/scripts/repro-gate.js b/.github/scripts/repro-gate.js
new file mode 100644
index 00000000..69fffa41
--- /dev/null
+++ b/.github/scripts/repro-gate.js
@@ -0,0 +1,186 @@
+#!/usr/bin/env node
+// Repro red→green gate judge.
+//
+// Decides, from two AutoTest `results.json` files, whether a repro plan
+// genuinely proves a bug fix: it must FAIL on the un-fixed base build (RED)
+// and PASS on the fixed head build (GREEN). Run by the `repro-gate-*` jobs in
+// .github/workflows/e2eUI.yml, once per repro-issue-<n>.yaml plan per OS.
+//
+// Usage:
+//   node repro-gate.js <baseResultsJson> <headResultsJson> [planName] [os]
+//
+// Exit codes:
+//   0  RED→GREEN proven (base failed a deterministic assertion, head all-pass)
+//   1  gate failed — one of:
+//        NOT_REPRODUCED  base passed        → plan does not reproduce the bug
+//        NOT_FIXED       head still fails   → fix does not resolve the bug
+//        INCONCLUSIVE    base/head crashed or errored (infra flake) → retry
+//
+// Why summary.failed (not the process exit code) decides RED:
+//   `autotest run` exits 1 for BOTH a real assertion failure and a crash /
+//   infra error. Only a deterministic assertion `fail` (summary.failed >= 1,
+//   not `errors`, not `crashed`) counts as a genuine reproduction. A crash on
+//   base would otherwise be mis-read as "reproduced".
+
+"use strict";
+
+const fs = require("fs");
+
+function loadReport(p) {
+  try {
+    const raw = fs.readFileSync(p, "utf8");
+    const json = JSON.parse(raw);
+    return { ok: true, ...json };
+  } catch (e) {
+    return { ok: false, missing: true, loadError: e.message };
+  }
+}
+
+function summaryOf(r) {
+  const s = r.summary || {};
+  return {
+    total: s.total ?? 0,
+    passed: s.passed ?? 0,
+    failed: s.failed ?? 0,
+    errors: s.errors ?? 0,
+    skipped: s.skipped ?? 0,
+  };
+}
+
+function failingSteps(r) {
+  return (r.results || [])
+    .filter((s) => s.status === "fail" || s.status === "error")
+    .map((s) => ({
+      stepId: s.stepId,
+      action: s.action,
+      status: s.status,
+      reason: (s.reason || "").toString().slice(0, 300),
+    }));
+}
+
+function classifyBase(r) {
+  // A trustworthy RED = did not crash AND at least one deterministic
+  // assertion `fail`. Errors / crashes are infra noise, not reproduction.
+  if (!r.ok || r.crashed === true) return "CRASHED";
+  const s = summaryOf(r);
+  if (s.failed >= 1) return "RED";
+  if (s.errors >= 1) return "ERRORED";
+  return "GREEN"; // ran clean, nothing failed → did NOT reproduce
+}
+
+function classifyHead(r) {
+  if (!r.ok || r.crashed === true) return "CRASHED";
+  const s = summaryOf(r);
+  if (s.failed === 0 && s.errors === 0) return "GREEN";
+  return "RED"; // fix build still failing / erroring
+}
+
+function icon(kind) {
+  return { RED: "❌", GREEN: "✅", CRASHED: "💥", ERRORED: "⚠️", ERROR: "⚠️" }[kind] || "❔";
+}
+
+function main() {
+  const [baseJson, headJson, planNameArg, osArg] = process.argv.slice(2);
+  if (!baseJson || !headJson) {
+    console.error("usage: repro-gate.js <baseResultsJson> <headResultsJson> [plan] [os]");
+    process.exit(2);
+  }
+  const plan = planNameArg || "repro-plan";
+  const os = osArg || process.env.RUNNER_OS || "";
+
+  const base = loadReport(baseJson);
+  const head = loadReport(headJson);
+  const baseKind = classifyBase(base);
+  const headKind = classifyHead(head);
+  const baseSum = summaryOf(base);
+  const headSum = summaryOf(head);
+
+  // ── Verdict ──────────────────────────────────────────────
+  let verdict, exit, message;
+  if (baseKind === "CRASHED" || baseKind === "ERRORED") {
+    verdict = "INCONCLUSIVE";
+    exit = 1;
+    message =
+      `Base (un-fixed) run did not produce a clean assertion result ` +
+      `(${baseKind.toLowerCase()}). This is an infrastructure flake, not a ` +
+      `reproduction — re-run the job. If it persists, the editor is not ` +
+      `launching (check the pre-warm / .vscode-test cache).`;
+  } else if (baseKind === "GREEN") {
+    verdict = "NOT_REPRODUCED";
+    exit = 1;
+    message =
+      `The repro plan PASSED on the un-fixed base build, so it does NOT ` +
+      `reproduce the bug (no RED). Tighten the decisive assertion so it ` +
+      `asserts the EXPECTED behaviour and therefore fails on the buggy build.`;
+  } else if (headKind === "CRASHED") {
+    verdict = "INCONCLUSIVE";
+    exit = 1;
+    message =
+      `Base reproduced the bug (RED), but the fixed head run crashed — ` +
+      `infrastructure flake, re-run the job.`;
+  } else if (headKind === "RED") {
+    verdict = "NOT_FIXED";
+    exit = 1;
+    message =
+      `The fix build STILL FAILS the repro plan (no GREEN), so the bug is ` +
+      `not resolved. See the failing head step(s) below.`;
+  } else {
+    verdict = "PROVEN";
+    exit = 0;
+    message = `RED→GREEN proven: the bug reproduces on base and is fixed on head.`;
+  }
+
+  // ── Markdown report ──────────────────────────────────────
+  const title = `Repro red→green gate — \`${plan}\`${os ? ` (${os})` : ""}`;
+  const baseDecisive =
+    baseKind === "RED"
+      ? failingSteps(base).map((s) => `\`${s.stepId}\`: ${s.reason || s.status}`).join("<br>") || "—"
+      : baseKind === "GREEN"
+      ? "no step failed (did not reproduce)"
+      : (base.crashReason || base.loadError || baseKind);
+  const headDecisive =
+    headKind === "GREEN"
+      ? `all ${headSum.total} step(s) passed`
+      : headKind === "RED"
+      ? failingSteps(head).map((s) => `\`${s.stepId}\`: ${s.reason || s.status}`).join("<br>") || "—"
+      : (head.crashReason || head.loadError || headKind);
+
+  const md = [
+    `### ${title}`,
+    ``,
+    `**Verdict: ${exit === 0 ? "✅" : "❌"} ${verdict}** — ${message}`,
+    ``,
+    `| Build | Under test | Result | Steps (p/f/e) | Decisive |`,
+    `|-------|-----------|--------|---------------|----------|`,
+    `| base | \`main\` (un-fixed) | ${icon(baseKind)} ${baseKind} | ${baseSum.passed}/${baseSum.failed}/${baseSum.errors} | ${baseDecisive} |`,
+    `| head | PR (fix) | ${icon(headKind)} ${headKind} | ${headSum.passed}/${headSum.failed}/${headSum.errors} | ${headDecisive} |`,
+    ``,
+    exit === 0
+      ? `> The base build reproduces the bug and the head build fixes it — a genuine regression guard.`
+      : `> Gate blocked: ${verdict}. ${message}`,
+    ``,
+  ].join("\n");
+
+  console.log(md);
+
+  // GitHub job summary
+  const summaryFile = process.env.GITHUB_STEP_SUMMARY;
+  if (summaryFile) {
+    try {
+      fs.appendFileSync(summaryFile, md + "\n");
+    } catch (e) {
+      console.error(`(could not write job summary: ${e.message})`);
+    }
+  }
+
+  // Workflow annotation
+  if (exit === 0) {
+    console.log(`::notice title=Repro gate ${plan}::${verdict} — ${message}`);
+  } else {
+    console.log(`::error title=Repro gate ${plan}::${verdict} — ${message}`);
+  }
+
+  process.exit(exit);
+}
+
+main();
diff --git a/.github/skills/repro/SKILL.md b/.github/skills/repro/SKILL.md
index 57fa43e3..7c69a4ed 100644
--- a/.github/skills/repro/SKILL.md
+++ b/.github/skills/repro/SKILL.md
@@ -23,6 +23,7 @@ The reproduction and the fix-proof are two different questions — decide each:
 
 - **Reproduction** can often be non-UI or even a code read, especially for simple, obvious bugs. Prefer the cheapest reproduction that captures the report.
 - **Fix-proof** is where a UI/E2E test earns its cost: a red run before the fix and a green run after, with screenshots, is the strongest evidence for a user-facing bug. If the bug is user-facing, favour leaving a committed UI plan even when you first reproduced it another way.
+- **The red→green is proven by CI, not by prose.** When you commit a `test/e2e-plans/repro-issue-<n>.yaml`, `.github/workflows/e2eUI.yml` runs a **red→green gate** (see §5) that rebuilds the PR's base (un-fixed) code and runs your plan against base **and** head, requiring `base = RED, head = GREEN`. So your job is to author a plan whose decisive assertion **fails on the un-fixed build and passes on the fix** — the gate does the proving. Do not merely assert red→green in the PR body; make the plan actually reproduce.
 
 **Use a UI/E2E AutoTest plan (`uitest` skill) when the bug is in the user-facing surface**, e.g.:
 
@@ -64,6 +65,8 @@ npx -y @vscjava/vscode-autotest run test\e2e-plans\repro-issue-<n>.yaml --vsix v
 
 Author the plan step-by-step for the **actions**, but you do not need a verifier on every step — put a deterministic verifier (`verifyTreeItem` / `verifyFile` / `verifyEditorTab` / `verifyClipboard`) on the **decisive assertion step** (the one that captures the bug) and on any step prone to a silent no-op. That decisive verifier must assert the **expected** behavior, so it **fails on the current (buggy) build**. Inspect `test-results/repro-issue-<n>/results.json` and the screenshots to confirm the failure matches the report, and keep the red-run screenshot as before-fix evidence.
 
+**Run this on the un-fixed checkout FIRST — see RED before you write the fix.** That is the whole point of the reproduction: build + run the plan against the current (buggy) product code and confirm the decisive verifier fails with the reported symptom. Only then move to §5 and write the fix. This local red→green loop is fast in the agent env (VS Code is pre-warmed) and is what gives you confidence the plan actually reproduces before CI re-proves it.
+
 **Non-UI path** — add the failing `test/maven-suite` or `jdtls.ext` test and run the existing suite (`npm test`, or the `jdtls.ext` Maven test) to confirm it fails.
 
 ## 5. Fix, then prove it
@@ -71,14 +74,30 @@ Author the plan step-by-step for the **actions**, but you do not need a verifier
 1. Fix the product code (`src/**` for TS, `jdtls.ext/**` for the OSGi backend).
 2. **Rebuild and repackage the VSIX** (`npm run build-server` + `vsce package`) before rerunning any UI plan — never rerun against a stale VSIX.
 3. Rerun the reproduction; the same plan/test must now pass (red → green).
-4. Capture both runs' evidence: the **before** (red) and **after** (green) results. The green run is the primary proof the fix works. You do **not** need to attach images by hand — when the plan is on the PR, `.github/workflows/e2eUI.yml` re-runs it on Linux + Windows and uploads the full `test-results/` (screenshots + `results.json`) as `e2e-results-<os>-<plan>` artifacts. Link those in the PR and paste the `results.json` reason from your own red run.
+4. Capture both runs' evidence: the **before** (red) and **after** (green) results. The green run is the primary proof the fix works. You do **not** need to attach images by hand — when the plan is on the PR, `.github/workflows/e2eUI.yml` re-runs it on Linux + Windows and uploads the full `test-results/` (screenshots + `results.json`) as artifacts. Link those in the PR and paste the `results.json` reason from your own red run.
 5. Leave the reproduction committed as a permanent regression test. `.github/workflows/e2eUI.yml` discovers `test/e2e-plans/*.yaml` automatically, so `repro-issue-<n>.yaml` becomes its own CI check with no workflow edits.
 
+### The CI red→green gate (authoritative proof)
+
+A regression plan run once only ever proves GREEN on the fixed code. So for a `repro-issue-<n>.yaml`, `.github/workflows/e2eUI.yml` runs a dedicated **red→green gate** that is the authoritative machine proof — you do **not** have to reproduce the red→green in the PR body by argument:
+
+- On a pull request, the gate **rebuilds the PR's base commit** (`main`, before your fix) into its own VSIX, then runs your repro plan against **both** builds in one CI run:
+  - **base (un-fixed) → must be ❌ RED** — a deterministic assertion `fail` (not a crash/error), proving the plan reproduces the bug.
+  - **head (fix) → must be ✅ GREEN** — all steps pass, proving the fix works.
+- `.github/scripts/repro-gate.js` reads both `results.json` files and passes the check only for `base RED && head GREEN`. It fails with a clear verdict otherwise:
+  - `NOT_REPRODUCED` — your plan passed on the un-fixed base, so it does **not** capture the bug. Tighten the decisive assertion so it asserts the **expected** behaviour.
+  - `NOT_FIXED` — head still fails; the bug is not resolved.
+  - `INCONCLUSIVE` — base or head crashed/errored (infra flake); re-run the job.
+- The gate's verdict table (`base ❌ RED → head ✅ GREEN`) is written to the job summary, and both runs' `test-results/` are uploaded as `repro-gate-results-<os>-<plan>` artifacts (screenshots + `results.json`). **This is the fix-proof** — reference it in the PR.
+- The gate runs only on `pull_request` events. After merge (push to `main`) the base already contains the fix, so the same plan is demoted to an ordinary GREEN regression check.
+
+Because CI reconstructs the red from the base commit, your PR stays a single clean PR — **commit the repro plan and the fix together**; you never have to push a knowingly-broken commit to demonstrate the red.
+
 ## 6. Report back
 
 Every PR or comment must state **how you reproduced** (UI plan vs unit test vs code read) and the **execution status** (ran red→green, or could not execute — and why). Never claim a green run you did not observe.
 
-- **Reproduced + fixed**: open a PR that links the CI `e2e-results-<os>-<plan>` artifacts as the fix-proof, cites the failing step / `results.json` reason from your red run, and notes the committed reproduction now passes. Reference the issue.
+- **Reproduced + fixed**: open a **single PR containing the repro plan and the fix together**, and let the red→green gate (§5) prove it. In the PR body, reference the gate's `repro-gate-results-<os>-<plan>` artifacts and its `base ❌ RED → head ✅ GREEN` verdict, and cite the failing step / `results.json` reason from your own local red run. Reference the issue.
 - **Reproduced, report only**: comment with the reproduction (plan or test), the observed vs expected behavior, and the exact failing step.
 - **Reproduced but could not run the UI test**: remember a `(dns block)` on `update.code.visualstudio.com` is expected and non-fatal (see Environment notes) — it is **not** a reason to skip the UI path. Only if the editor genuinely never launches, commit the plan, explain the real failure, and fall back to a non-UI proof or ask a maintainer to unblock.
 - **Could not reproduce**: comment with what you tried and precisely what is missing; label `needs-more-info`. Do not fabricate a fix for an unreproduced bug.
@@ -89,6 +108,6 @@ Every PR or comment must state **how you reproduced** (UI plan vs unit test vs c
 - That setup runs **before the agent firewall**, and its final step pre-downloads the **latest** VS Code (`stable`) and the `vscjava.vscode-java-pack` extensions into AutoTest's `<repo>/.vscode-test` cache (via `.github/scripts/prewarm-vscode.js`). Keep the plans on `vscodeVersion: "stable"` (do **not** pin a version) — `stable` always means the current latest release, and it is exactly what the pre-warm cached.
 - **A `(dns block)` on `update.code.visualstudio.com` at run time is EXPECTED and NON-FATAL — do not treat it as a UI-test failure or abandon the UI path.** AutoTest re-resolves `stable` over the network at launch; the firewall blocks that, but `@vscode/test-electron` catches it and **falls back to the already-cached latest VS Code**, and the Java extensions are already installed in `.vscode-test/extensions`. So the editor still launches offline. VS Code's own telemetry/Marketplace DNS calls are blocked too and are equally harmless.
 - Only if the pre-warm genuinely did not run (e.g. an older branch, or a cold `.vscode-test` with no cached build) will the UI run actually fail to launch. In that case fall back to the non-UI path and note the limitation.
-- **Screenshots / results are captured for you by CI, not by hand.** When the reproduction plan lands on a PR (base `main`), `.github/workflows/e2eUI.yml` runs it on Linux **and** Windows and uploads the whole `test-results/` directory (screenshots + `results.json`) as `e2e-results-<os>-<plan>` artifacts, plus an aggregate `summary.md`. In the PR body, link those artifacts as the fix-proof and paste the `results.json` failure reason from your own red run — you do not need to attach images manually.
+- **Screenshots / results are captured for you by CI, not by hand.** When a `repro-issue-<n>.yaml` lands on a PR (base `main`), the red→green gate (§5) runs it on Linux **and** Windows against the base and head builds and uploads the whole `test-results/` directory (screenshots + `results.json` for both the base RED and head GREEN runs) as `repro-gate-results-<os>-<plan>` artifacts, plus a `base ❌ RED → head ✅ GREEN` verdict in the job summary. In the PR body, reference those artifacts and the verdict as the fix-proof, and paste the `results.json` failure reason from your own local red run — you do not need to attach images manually. (Ordinary `java-dep-*.yaml` regression plans still upload `e2e-results-<os>-<plan>` from a single green run.)
 - Maintainer option: adding `update.code.visualstudio.com` to the Copilot coding-agent firewall allowlist (repo **Settings → Copilot → coding agent**, see https://gh.io/copilot/firewall-config) removes the version-resolution block entirely, so the run is clean and does not rely on the offline fallback. The pre-warm still makes the 276 MB binary + Marketplace pack a cache hit, so nothing large is re-fetched.
 - Always run AutoTest with `--no-llm` in the agent so pass/fail comes only from deterministic verifiers.
diff --git a/.github/workflows/e2eUI.yml b/.github/workflows/e2eUI.yml
index b2c04972..77a707e2 100644
--- a/.github/workflows/e2eUI.yml
+++ b/.github/workflows/e2eUI.yml
@@ -9,19 +9,37 @@ on:
 # Split-pipeline E2E UI workflow.
 #
 #   lint              → tslint + checkstyle (ubuntu, OS-agnostic)
-#   discover-plans    → emits a matrix of test-plan basenames
+#   discover-plans    → splits plans into two matrices:
+#                         · regression  = java-dep-*.yaml (run once, expect GREEN)
+#                         · repro       = repro-issue-*.yaml (red→green gate)
 #
-#   build-linux       ─┐
-#   e2e-linux  (×plan) ┤
-#                      ├──→ analyze  → unified summary covering both OSes
-#   build-windows     ─┤
-#   e2e-windows (×plan)┘
+#   build-linux        ─┐
+#   e2e-linux  (×reg)   ┤
+#                       ├──→ analyze  → unified summary covering both OSes
+#   build-windows      ─┤
+#   e2e-windows (×reg)  ┘
 #
-# Per-OS pipelines run completely independently: Linux e2e jobs do NOT
-# wait for the Windows VSIX build (and vice versa), so a slow Windows
-# build cannot delay the start of Linux e2e plans. Each matrix cell
-# surfaces as its own PR check, so failures are visible without an
-# extra gate job.
+#   build-base-linux   ─┐ (PR only, only when a repro-issue-*.yaml exists)
+#   repro-gate-linux   ─┤   builds the PR *base* (un-fixed) VSIX, runs the
+#   build-base-windows ─┤   repro plan against BOTH base and head, and proves
+#   repro-gate-windows ─┘   the bug is RED on base and GREEN on head.
+#
+# Per-OS pipelines run completely independently: Linux jobs do NOT wait for
+# the Windows VSIX build (and vice versa). Each matrix cell surfaces as its
+# own PR check, so failures are visible without an extra gate job.
+#
+# ── Red→green gate (model A) ────────────────────────────────
+# A regression plan run once only ever proves GREEN on the fixed code. To
+# prove a Copilot-authored repro plan genuinely captures the bug, the gate
+# rebuilds the PR's *base* commit (main, un-fixed) into its own VSIX and runs
+# the SAME repro-issue-<n>.yaml against base AND head in one CI run:
+#     base (main, un-fixed) → expect ❌ RED  (bug reproduced)
+#     head (PR, fixed)      → expect ✅ GREEN (fix works)
+# .github/scripts/repro-gate.js reads both results.json files and passes the
+# check only when base failed a deterministic assertion and head is all-pass
+# (distinguishing a genuine assertion RED from an infra crash). The gate runs
+# only on pull_request events; after merge (push to main) the base already
+# contains the fix, so the plan is demoted to an ordinary GREEN regression.
 #
 # discover-plans globs test/e2e-plans/*.yaml, so a Copilot-authored
 # repro-issue-<n>.yaml is picked up automatically. Each plan's full
@@ -65,7 +83,9 @@ jobs:
     name: Discover E2E Plans
     runs-on: ubuntu-latest
     outputs:
-      matrix: ${{ steps.scan.outputs.matrix }}
+      regression: ${{ steps.scan.outputs.regression }}
+      repro: ${{ steps.scan.outputs.repro }}
+      has_repro: ${{ steps.scan.outputs.has_repro }}
     steps:
       - uses: actions/checkout@v4
 
@@ -73,9 +93,31 @@ jobs:
         id: scan
         shell: bash
         run: |
-          plans=$(ls test/e2e-plans/*.yaml | xargs -n1 basename | sed 's/\.yaml$//' | jq -R . | jq -sc .)
-          echo "matrix=$plans" >> "$GITHUB_OUTPUT"
-          echo "Found plans: $plans"
+          all=$(ls test/e2e-plans/*.yaml | xargs -n1 basename | sed 's/\.yaml$//')
+          repro=$(printf '%s\n' "$all" | grep '^repro-issue-' || true)
+          regression=$(printf '%s\n' "$all" | grep -v '^repro-issue-' || true)
+
+          # The red→green gate only makes sense on a PR (it diffs base vs head).
+          # On push to main the base already contains the fix, so run every
+          # plan — including repro-issue-* — as an ordinary GREEN regression.
+          if [ "${{ github.event_name }}" != "pull_request" ]; then
+            regression="$all"
+            repro=""
+          fi
+
+          to_json() { printf '%s\n' "$1" | grep -v '^[[:space:]]*$' | jq -R . | jq -sc .; }
+          reg_json=$(to_json "$regression")
+          repro_json=$(to_json "$repro")
+
+          echo "regression=$reg_json" >> "$GITHUB_OUTPUT"
+          echo "repro=$repro_json"     >> "$GITHUB_OUTPUT"
+          if [ "$repro_json" = "[]" ]; then
+            echo "has_repro=false" >> "$GITHUB_OUTPUT"
+          else
+            echo "has_repro=true" >> "$GITHUB_OUTPUT"
+          fi
+          echo "Regression plans: $reg_json"
+          echo "Repro plans:      $repro_json"
 
   # ── Build VSIX (Linux) ──────────────────────────────────
   build-linux:
@@ -162,7 +204,7 @@ jobs:
     strategy:
       fail-fast: false
       matrix:
-        plan: ${{ fromJson(needs.discover-plans.outputs.matrix) }}
+        plan: ${{ fromJson(needs.discover-plans.outputs.regression) }}
 
     steps:
       - uses: actions/checkout@v4
@@ -225,7 +267,7 @@ jobs:
     strategy:
       fail-fast: false
       matrix:
-        plan: ${{ fromJson(needs.discover-plans.outputs.matrix) }}
+        plan: ${{ fromJson(needs.discover-plans.outputs.regression) }}
 
     steps:
       - uses: actions/checkout@v4
@@ -267,6 +309,240 @@ jobs:
           path: test-results/
           retention-days: 7
 
+  # ── Build base (un-fixed) VSIX for the red→green gate ───
+  # Only runs on PRs that add/contain a repro-issue-*.yaml plan. Checks out
+  # the PR's base commit (main, before the fix) and packages it so the gate
+  # can prove the repro plan is RED on un-fixed code.
+  build-base-linux:
+    name: Build base VSIX (Linux)
+    needs: [ discover-plans ]
+    if: ${{ github.event_name == 'pull_request' && needs.discover-plans.outputs.has_repro == 'true' }}
+    runs-on: ubuntu-latest
+    timeout-minutes: 20
+    steps:
+      - uses: actions/checkout@v4
+        with:
+          ref: ${{ github.event.pull_request.base.sha }}
+
+      - name: Set up JDK 21
+        uses: actions/setup-java@v4
+        with:
+          java-version: '21'
+          distribution: 'temurin'
+
+      - name: Setup Node.js environment
+        uses: actions/setup-node@v4
+        with:
+          node-version: 20
+
+      - name: Install Node.js modules
+        run: npm install
+
+      - name: Install VSCE
+        run: npm install -g @vscode/vsce
+
+      - name: Build OSGi bundle
+        run: npm run build-server
+
+      - name: Build base VSIX file
+        run: vsce package -o vscode-java-dependency-base.vsix
+
+      - name: Upload base VSIX artifact
+        uses: actions/upload-artifact@v4
+        with:
+          name: vsix-base-linux
+          path: vscode-java-dependency-base.vsix
+          retention-days: 1
+
+  build-base-windows:
+    name: Build base VSIX (Windows)
+    needs: [ discover-plans ]
+    if: ${{ github.event_name == 'pull_request' && needs.discover-plans.outputs.has_repro == 'true' }}
+    runs-on: windows-latest
+    timeout-minutes: 20
+    steps:
+      - uses: actions/checkout@v4
+        with:
+          ref: ${{ github.event.pull_request.base.sha }}
+
+      - name: Set up JDK 21
+        uses: actions/setup-java@v4
+        with:
+          java-version: '21'
+          distribution: 'temurin'
+
+      - name: Setup Node.js environment
+        uses: actions/setup-node@v4
+        with:
+          node-version: 20
+
+      - name: Install Node.js modules
+        run: npm install
+
+      - name: Install VSCE
+        run: npm install -g @vscode/vsce
+
+      - name: Build OSGi bundle
+        run: npm run build-server
+
+      - name: Build base VSIX file
+        run: vsce package -o vscode-java-dependency-base.vsix
+
+      - name: Upload base VSIX artifact
+        uses: actions/upload-artifact@v4
+        with:
+          name: vsix-base-windows
+          path: vscode-java-dependency-base.vsix
+          retention-days: 1
+
+  # ── Red→green gate (Linux) ──────────────────────────────
+  # Runs each repro-issue-*.yaml against the base (un-fixed) VSIX and the
+  # head (fixed) VSIX, then repro-gate.js proves base=RED and head=GREEN.
+  repro-gate-linux:
+    name: Repro Gate Linux (${{ matrix.plan }})
+    needs: [ build-linux, build-base-linux, discover-plans ]
+    if: ${{ github.event_name == 'pull_request' && needs.discover-plans.outputs.has_repro == 'true' }}
+    runs-on: ubuntu-latest
+    timeout-minutes: 40
+    strategy:
+      fail-fast: false
+      matrix:
+        plan: ${{ fromJson(needs.discover-plans.outputs.repro) }}
+
+    steps:
+      - uses: actions/checkout@v4   # head checkout provides the repro plan yaml
+
+      - name: Setup Build Environment (Xvfb)
+        run: |
+          sudo apt-get update
+          sudo apt-get install -y libxkbfile-dev pkg-config libsecret-1-dev libxss1 dbus xvfb libgtk-3-0 libgbm1
+          sudo /usr/bin/Xvfb :99 -screen 0 1920x1080x24 > /dev/null 2>&1 &
+          sleep 3
+
+      - name: Set up JDK 21
+        uses: actions/setup-java@v4
+        with:
+          java-version: '21'
+          distribution: 'temurin'
+
+      - name: Setup Node.js environment
+        uses: actions/setup-node@v4
+        with:
+          node-version: 20
+
+      - name: Setup autotest
+        run: npm install -g @vscjava/vscode-autotest
+
+      - name: Download head VSIX (fixed)
+        uses: actions/download-artifact@v4
+        with:
+          name: vsix-linux
+          path: .
+
+      - name: Download base VSIX (un-fixed)
+        uses: actions/download-artifact@v4
+        with:
+          name: vsix-base-linux
+          path: .
+
+      - name: Run repro plan on base then head
+        shell: bash
+        run: |
+          # Each run is EXPECTED to have a non-zero exit on base (RED), so do
+          # not let the step fail here — repro-gate.js is the sole judge.
+          set +e
+          DISPLAY=:99 autotest run "test/e2e-plans/${{ matrix.plan }}.yaml" \
+            --vsix "$(pwd)/vscode-java-dependency-base.vsix" \
+            --no-llm --output "test-results/base-${{ matrix.plan }}"
+          echo "base autotest exit: $?"
+          DISPLAY=:99 autotest run "test/e2e-plans/${{ matrix.plan }}.yaml" \
+            --vsix "$(pwd)/vscode-java-dependency.vsix" \
+            --no-llm --output "test-results/head-${{ matrix.plan }}"
+          echo "head autotest exit: $?"
+          set -e
+
+      - name: Judge red→green
+        shell: bash
+        run: |
+          node .github/scripts/repro-gate.js \
+            "test-results/base-${{ matrix.plan }}/results.json" \
+            "test-results/head-${{ matrix.plan }}/results.json" \
+            "${{ matrix.plan }}" "Linux"
+
+      - name: Upload gate results
+        if: ${{ always() }}
+        uses: actions/upload-artifact@v4
+        with:
+          name: repro-gate-results-linux-${{ matrix.plan }}
+          path: test-results/
+          retention-days: 7
+
+  # ── Red→green gate (Windows) ────────────────────────────
+  repro-gate-windows:
+    name: Repro Gate Windows (${{ matrix.plan }})
+    needs: [ build-windows, build-base-windows, discover-plans ]
+    if: ${{ github.event_name == 'pull_request' && needs.discover-plans.outputs.has_repro == 'true' }}
+    runs-on: windows-latest
+    timeout-minutes: 40
+    strategy:
+      fail-fast: false
+      matrix:
+        plan: ${{ fromJson(needs.discover-plans.outputs.repro) }}
+
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Set up JDK 21
+        uses: actions/setup-java@v4
+        with:
+          java-version: '21'
+          distribution: 'temurin'
+
+      - name: Setup Node.js environment
+        uses: actions/setup-node@v4
+        with:
+          node-version: 20
+
+      - name: Setup autotest
+        run: npm install -g @vscjava/vscode-autotest
+
+      - name: Download head VSIX (fixed)
+        uses: actions/download-artifact@v4
+        with:
+          name: vsix-windows
+          path: .
+
+      - name: Download base VSIX (un-fixed)
+        uses: actions/download-artifact@v4
+        with:
+          name: vsix-base-windows
+          path: .
+
+      - name: Run repro plan on base then head
+        shell: pwsh
+        run: |
+          $head = "$((Get-Location).Path)\vscode-java-dependency.vsix"
+          $base = "$((Get-Location).Path)\vscode-java-dependency-base.vsix"
+          autotest run "test/e2e-plans/${{ matrix.plan }}.yaml" --vsix "$base" --no-llm --output "test-results\base-${{ matrix.plan }}"
+          Write-Host "base autotest exit: $LASTEXITCODE"
+          autotest run "test/e2e-plans/${{ matrix.plan }}.yaml" --vsix "$head" --no-llm --output "test-results\head-${{ matrix.plan }}"
+          Write-Host "head autotest exit: $LASTEXITCODE"
+          # Do not fail the step on a non-zero autotest exit — the gate judges.
+          exit 0
+
+      - name: Judge red→green
+        shell: pwsh
+        run: |
+          node .github/scripts/repro-gate.js "test-results\base-${{ matrix.plan }}\results.json" "test-results\head-${{ matrix.plan }}\results.json" "${{ matrix.plan }}" "Windows"
+
+      - name: Upload gate results
+        if: ${{ always() }}
+        uses: actions/upload-artifact@v4
+        with:
+          name: repro-gate-results-windows-${{ matrix.plan }}
+          path: test-results/
+          retention-days: 7
+
   # ── Unified analysis across both OSes ───────────────────
   analyze:
     name: E2E Summary

From e011054f88ef0595ab69728316b593f88d8e709e Mon Sep 17 00:00:00 2001
From: wenytang-ms <wenyutang@microsoft.com>
Date: Thu, 2 Jul 2026 16:09:45 +0800
Subject: [PATCH 05/10] docs: gate the repro/UI-test flow behind an explicit
 task triage

Make clear the repro path is opt-in, not automatic for every assigned
issue. Only reproducible bugs enter the repro/uitest flow; features,
refactors, dep bumps, docs, and non-reproducible reports take an ordinary
PR with no repro-issue-*.yaml. Since the CI red->green gate only fires
when a repro plan file is present, the routing decision is made purely by
whether a plan is committed. Also clarify that lint + java-dep-* regression
E2E always run, while the gate is the additional opt-in check.
---
 .github/copilot-instructions.md | 6 ++++--
 .github/skills/repro/SKILL.md   | 4 +++-
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md
index 24de89ab..1df6670c 100644
--- a/.github/copilot-instructions.md
+++ b/.github/copilot-instructions.md
@@ -2,8 +2,10 @@
 
 ## Bug reproduction
 
-- When an issue is assigned to Copilot, or you are asked to reproduce or confirm a reported bug, use the `repro` skill.
-- First decide whether the bug needs a UI/E2E test. Use an AutoTest plan (`uitest` skill) for user-facing surfaces (Java Projects tree, context menus, commands, classpath, export jar, view modes). Use a `test/maven-suite` unit test or a `jdtls.ext` test for pure logic, backend, or build/packaging bugs.
+- **Classify the task first — the repro / UI-test flow is opt-in, not automatic.** Use the `repro` skill **only** when the task is to fix or confirm a **reproducible bug** (an issue that carries repro steps + a project, or you are explicitly asked to reproduce/confirm a report). For everything else — new features, refactors, performance work, dependency/version bumps, docs, config, CI, or code cleanup — make a normal PR with the appropriate unit/integration tests and **do not** author a `test/e2e-plans/repro-issue-*.yaml`. No repro plan file means the CI red→green gate never triggers; nothing extra runs.
+- **What always runs vs what is opt-in:** every PR to `main` still gets lint + the existing `java-dep-*` regression E2E (unchanged safety net). The red→green **gate is additional and fires only when the PR contains a `repro-issue-<n>.yaml`.** So the decision to enter this flow is made purely by whether you commit a repro plan.
+- If a report is **not reproducible** (vague, missing project, environment- or hardware-specific, depends on an external service), do **not** force a reproduction or invent a plan: ask for a minimal repro and label `needs-more-info`, or fix with the best available non-UI test and say so.
+- When you have decided the task **is** a reproducible bug: first decide whether it needs a UI/E2E test. Use an AutoTest plan (`uitest` skill) for user-facing surfaces (Java Projects tree, context menus, commands, classpath, export jar, view modes). Use a `test/maven-suite` unit test or a `jdtls.ext` test for pure logic, backend, or build/packaging bugs.
 - Reproduce with the reporter's project: clone the linked repo as a sibling or recreate the zip/inline sources, then distill it to a **minimal committed fixture**. Do not commit whole user projects or large binaries.
 - Author the reproduction so it fails on the current build and passes after the fix, and leave it committed as a regression test (a new `test/e2e-plans/repro-issue-<n>.yaml` is picked up by CI automatically). Commit the repro plan and the fix **together in one PR** — CI's red→green gate rebuilds the PR base to prove the red, so you never push a knowingly-broken commit.
 - If no reproducible project is provided and the bug is environment-specific, ask for one and label `needs-more-info` — do not fabricate a fix for an unreproduced bug.
diff --git a/.github/skills/repro/SKILL.md b/.github/skills/repro/SKILL.md
index 7c69a4ed..bb7b34f2 100644
--- a/.github/skills/repro/SKILL.md
+++ b/.github/skills/repro/SKILL.md
@@ -5,7 +5,9 @@ description: Reproduce a reported vscode-java-dependency (Project Manager for Ja
 
 # Reproduce a reported bug
 
-Use this skill when an issue is assigned to Copilot (or you are asked to reproduce/confirm a report) for `vscode-java-dependency` (Project Manager for Java).
+Use this skill when the task is to fix or confirm a **reproducible bug** in `vscode-java-dependency` (Project Manager for Java) — an issue that carries repro steps + a project, or an explicit request to reproduce/confirm a report.
+
+**Do NOT use this skill (and do not author a `repro-issue-*.yaml`) when the task is not a reproducible bug**, e.g. a new feature, refactor, performance work, dependency/version bump, docs, config, CI, or code cleanup — those are ordinary PRs with ordinary unit/integration tests. Also skip it when a report is **not reproducible** (vague, no project, environment/hardware-specific, external service): ask for a minimal repro and label `needs-more-info`, or fix with the best available non-UI test — never invent a repro plan just to have one. The CI red→green gate only triggers when a `repro-issue-<n>.yaml` is present, so not entering this flow means nothing extra runs.
 
 Goal: turn a bug report into a **deterministic, committed reproduction** that fails before the fix and passes after it. Prefer the smallest reproduction that proves the bug. Not every bug needs a UI test — decide first.
 

From 471666a1a8bc78dd6ba15dbea7822e348efa89d4 Mon Sep 17 00:00:00 2001
From: wenytang-ms <wenyutang@microsoft.com>
Date: Thu, 2 Jul 2026 16:22:48 +0800
Subject: [PATCH 06/10] docs(repro): add concrete flow for a zip attached to
 the issue

The skill listed 'attached zip' as a repro source but never said how to
obtain it. Add a download+unzip recipe (curl -L follows the user-attachments
302 to a signed objects.githubusercontent.com URL), note that github.com +
objects.githubusercontent.com + codeload.github.com are on the coding-agent
firewall's DEFAULT allowlist (so attachment downloads and repo clones are NOT
blocked, unlike the VS Code binary), handle signed-URL expiry, point the plan
workspace at the extracted project, and treat the archive as untrusted input
(extract only, commit just the minimal distilled fixture).
---
 .github/skills/repro/SKILL.md | 21 +++++++++++++++++++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/.github/skills/repro/SKILL.md b/.github/skills/repro/SKILL.md
index bb7b34f2..ecd007b7 100644
--- a/.github/skills/repro/SKILL.md
+++ b/.github/skills/repro/SKILL.md
@@ -15,7 +15,7 @@ Goal: turn a bug report into a **deterministic, committed reproduction** that fa
 
 From the issue body (and the `bug_report` template fields) collect:
 
-- **Repro project** — a public GitHub repo link, an attached zip, or an inline `pom.xml` / `build.gradle` + sources. If none is provided and the bug is environment-specific, ask for one and label the issue `needs-more-info` instead of guessing.
+- **Repro project** — a public GitHub repo link, an attached zip (a `https://github.com/user-attachments/files/<id>/<name>.zip` link in the issue body), or an inline `pom.xml` / `build.gradle` + sources. If none is provided and the bug is environment-specific, ask for one and label the issue `needs-more-info` instead of guessing.
 - **Steps to reproduce**, **expected** vs **actual** behavior, and the affected surface (tree view, context menu, command id, classpath, export jar, project creation, etc.).
 - **Versions** — VS Code, Extension Pack for Java, JDK, OS.
 
@@ -51,7 +51,23 @@ Keep the committed footprint small and CI-reproducible:
   git clone --depth 1 <repo-url> ..\repro-issue-<n>
   ```
 
-- **Zip / inline**: recreate the project under `test\e2e-fixtures\issue-<n>\` (or reuse `test/maven` / `test/invisible` if the existing fixtures already trigger the bug).
+  (`github.com` and `codeload.github.com` are on the coding-agent firewall's default allowlist, so the clone is not blocked.)
+
+- **Attached zip**: the issue body carries a link like `https://github.com/user-attachments/files/<id>/<name>.zip`. Download it (following the redirect) and unzip into a sibling dir, then point the plan's `workspace` at the extracted project:
+
+  ```powershell
+  # The user-attachments link 302-redirects to a signed objects.githubusercontent.com
+  # URL. BOTH github.com and objects.githubusercontent.com are on the coding-agent
+  # firewall's default allowlist, so this download is NOT blocked (unlike the VS Code
+  # binary). Use -L to follow the redirect. If the signed URL has expired, re-read the
+  # issue to get a fresh link, then re-download.
+  curl -L -o ..\repro-issue-<n>.zip "https://github.com/user-attachments/files/<id>/<name>.zip"
+  Expand-Archive ..\repro-issue-<n>.zip -DestinationPath ..\repro-issue-<n>   # bash: unzip
+  ```
+
+  **Treat the archive as untrusted input**: extract only — do not run its build scripts, Maven/Gradle wrappers, or other executables blindly. Confirm it is an ordinary Java project (`pom.xml` / `build.gradle` + `src/`), use it as the AutoTest `workspace:`, and commit only the minimal distilled fixture (never the raw zip or build outputs).
+
+- **Inline sources**: recreate the project under `test\e2e-fixtures\issue-<n>\` (or reuse `test/maven` / `test/invisible` if the existing fixtures already trigger the bug).
 - Once reproduced, **distill it to the minimal fixture** that still fails and commit that (not the whole user project) so the regression test runs in CI without external clones or large binaries.
 
 ## 4. Reproduce
@@ -112,4 +128,5 @@ Every PR or comment must state **how you reproduced** (UI plan vs unit test vs c
 - Only if the pre-warm genuinely did not run (e.g. an older branch, or a cold `.vscode-test` with no cached build) will the UI run actually fail to launch. In that case fall back to the non-UI path and note the limitation.
 - **Screenshots / results are captured for you by CI, not by hand.** When a `repro-issue-<n>.yaml` lands on a PR (base `main`), the red→green gate (§5) runs it on Linux **and** Windows against the base and head builds and uploads the whole `test-results/` directory (screenshots + `results.json` for both the base RED and head GREEN runs) as `repro-gate-results-<os>-<plan>` artifacts, plus a `base ❌ RED → head ✅ GREEN` verdict in the job summary. In the PR body, reference those artifacts and the verdict as the fix-proof, and paste the `results.json` failure reason from your own local red run — you do not need to attach images manually. (Ordinary `java-dep-*.yaml` regression plans still upload `e2e-results-<os>-<plan>` from a single green run.)
 - Maintainer option: adding `update.code.visualstudio.com` to the Copilot coding-agent firewall allowlist (repo **Settings → Copilot → coding agent**, see https://gh.io/copilot/firewall-config) removes the version-resolution block entirely, so the run is clean and does not rely on the offline fallback. The pre-warm still makes the 276 MB binary + Marketplace pack a cache hit, so nothing large is re-fetched.
+- **Issue attachments and repo clones are downloadable — they are NOT firewall-blocked.** `github.com`, `objects.githubusercontent.com`, `*.githubusercontent.com`, and `codeload.github.com` are all on the coding-agent's default allowlist, so cloning a linked public repo and `curl -L`-downloading an attached `user-attachments` zip both work at run time. (Only the VS Code binary host `update.code.visualstudio.com` is not allowlisted — that is why it is pre-warmed instead, see above.) Extract user-supplied zips as untrusted data: do not run their build scripts blindly.
 - Always run AutoTest with `--no-llm` in the agent so pass/fail comes only from deterministic verifiers.

From 901f7d9b7ef062452006a4536d443c787c9083b0 Mon Sep 17 00:00:00 2001
From: wenytang-ms <wenyutang@microsoft.com>
Date: Thu, 2 Jul 2026 16:50:21 +0800
Subject: [PATCH 07/10] ci: make the red->green gate OS-aware via repro plan
 filename suffix

An OS-specific bug (e.g. the Windows-only Copy Relative Path drive-letter
issue) does not manifest on the other OS, so running its repro plan through
that OS's gate would spuriously report NOT_REPRODUCED. discover-plans now
routes repro plans by filename suffix:
  repro-issue-<n>-windows.yaml -> Windows gate only
  repro-issue-<n>-linux.yaml   -> Linux gate only
  repro-issue-<n>.yaml         -> both gates (OS-agnostic)
build-base-* and repro-gate-* are gated per-OS (has_repro_linux /
has_repro_windows). Documented the naming convention in repro/SKILL.md.
---
 .github/skills/repro/SKILL.md | 10 ++++++-
 .github/workflows/e2eUI.yml   | 49 +++++++++++++++++++++--------------
 2 files changed, 39 insertions(+), 20 deletions(-)

diff --git a/.github/skills/repro/SKILL.md b/.github/skills/repro/SKILL.md
index ecd007b7..81f77b50 100644
--- a/.github/skills/repro/SKILL.md
+++ b/.github/skills/repro/SKILL.md
@@ -81,6 +81,14 @@ npx @vscode/vsce package -o vscode-java-dependency.vsix
 npx -y @vscjava/vscode-autotest run test\e2e-plans\repro-issue-<n>.yaml --vsix vscode-java-dependency.vsix --no-llm --output test-results\repro-issue-<n>
 ```
 
+**Name the plan for the OS the bug affects** — the red→green gate (§5) keys off the filename suffix:
+
+- `repro-issue-<n>-windows.yaml` — a **Windows-only** bug (e.g. drive-letter / path-separator / `\`-vs-`/` issues). The gate runs it on **Windows only**; the Linux gate skips it (the bug does not manifest there, so a Linux run would spuriously report `NOT_REPRODUCED`).
+- `repro-issue-<n>-linux.yaml` — a **Linux-only** bug. Windows gate skips it.
+- `repro-issue-<n>.yaml` — an **OS-agnostic** bug. The gate runs it on **both** Linux and Windows and both must go red→green.
+
+Pick the suffix from the report's platform: if the issue only reproduces on one OS, use that OS's suffix; only use the plain name when you have confirmed the bug is platform-independent.
+
 Author the plan step-by-step for the **actions**, but you do not need a verifier on every step — put a deterministic verifier (`verifyTreeItem` / `verifyFile` / `verifyEditorTab` / `verifyClipboard`) on the **decisive assertion step** (the one that captures the bug) and on any step prone to a silent no-op. That decisive verifier must assert the **expected** behavior, so it **fails on the current (buggy) build**. Inspect `test-results/repro-issue-<n>/results.json` and the screenshots to confirm the failure matches the report, and keep the red-run screenshot as before-fix evidence.
 
 **Run this on the un-fixed checkout FIRST — see RED before you write the fix.** That is the whole point of the reproduction: build + run the plan against the current (buggy) product code and confirm the decisive verifier fails with the reported symptom. Only then move to §5 and write the fix. This local red→green loop is fast in the agent env (VS Code is pre-warmed) and is what gives you confidence the plan actually reproduces before CI re-proves it.
@@ -99,7 +107,7 @@ Author the plan step-by-step for the **actions**, but you do not need a verifier
 
 A regression plan run once only ever proves GREEN on the fixed code. So for a `repro-issue-<n>.yaml`, `.github/workflows/e2eUI.yml` runs a dedicated **red→green gate** that is the authoritative machine proof — you do **not** have to reproduce the red→green in the PR body by argument:
 
-- On a pull request, the gate **rebuilds the PR's base commit** (`main`, before your fix) into its own VSIX, then runs your repro plan against **both** builds in one CI run:
+- On a pull request, the gate **rebuilds the PR's base commit** (`main`, before your fix) into its own VSIX, then runs your repro plan against **both** builds in one CI run, on the OS(es) implied by the filename suffix (`-windows` / `-linux` / none = both, see §4):
   - **base (un-fixed) → must be ❌ RED** — a deterministic assertion `fail` (not a crash/error), proving the plan reproduces the bug.
   - **head (fix) → must be ✅ GREEN** — all steps pass, proving the fix works.
 - `.github/scripts/repro-gate.js` reads both `results.json` files and passes the check only for `base RED && head GREEN`. It fails with a clear verdict otherwise:
diff --git a/.github/workflows/e2eUI.yml b/.github/workflows/e2eUI.yml
index 77a707e2..9526cbbc 100644
--- a/.github/workflows/e2eUI.yml
+++ b/.github/workflows/e2eUI.yml
@@ -84,8 +84,10 @@ jobs:
     runs-on: ubuntu-latest
     outputs:
       regression: ${{ steps.scan.outputs.regression }}
-      repro: ${{ steps.scan.outputs.repro }}
-      has_repro: ${{ steps.scan.outputs.has_repro }}
+      repro_linux: ${{ steps.scan.outputs.repro_linux }}
+      repro_windows: ${{ steps.scan.outputs.repro_windows }}
+      has_repro_linux: ${{ steps.scan.outputs.has_repro_linux }}
+      has_repro_windows: ${{ steps.scan.outputs.has_repro_windows }}
     steps:
       - uses: actions/checkout@v4
 
@@ -105,19 +107,28 @@ jobs:
             repro=""
           fi
 
+          # OS-specific bugs: a repro plan can target one OS by filename suffix.
+          #   repro-issue-<n>-windows.yaml → Windows gate only
+          #   repro-issue-<n>-linux.yaml   → Linux gate only
+          #   repro-issue-<n>.yaml         → both gates (OS-agnostic bug)
+          # This avoids a Windows-only bug being reported NOT_REPRODUCED on the
+          # Linux gate (where the bug simply does not manifest), and vice versa.
+          repro_linux=$(printf '%s\n' "$repro"   | grep -v -- '-windows$' || true)
+          repro_windows=$(printf '%s\n' "$repro" | grep -v -- '-linux$'   || true)
+
           to_json() { printf '%s\n' "$1" | grep -v '^[[:space:]]*$' | jq -R . | jq -sc .; }
           reg_json=$(to_json "$regression")
-          repro_json=$(to_json "$repro")
-
-          echo "regression=$reg_json" >> "$GITHUB_OUTPUT"
-          echo "repro=$repro_json"     >> "$GITHUB_OUTPUT"
-          if [ "$repro_json" = "[]" ]; then
-            echo "has_repro=false" >> "$GITHUB_OUTPUT"
-          else
-            echo "has_repro=true" >> "$GITHUB_OUTPUT"
-          fi
-          echo "Regression plans: $reg_json"
-          echo "Repro plans:      $repro_json"
+          linux_json=$(to_json "$repro_linux")
+          windows_json=$(to_json "$repro_windows")
+
+          echo "regression=$reg_json"      >> "$GITHUB_OUTPUT"
+          echo "repro_linux=$linux_json"   >> "$GITHUB_OUTPUT"
+          echo "repro_windows=$windows_json" >> "$GITHUB_OUTPUT"
+          [ "$linux_json"   = "[]" ] && echo "has_repro_linux=false"   >> "$GITHUB_OUTPUT" || echo "has_repro_linux=true"   >> "$GITHUB_OUTPUT"
+          [ "$windows_json" = "[]" ] && echo "has_repro_windows=false" >> "$GITHUB_OUTPUT" || echo "has_repro_windows=true" >> "$GITHUB_OUTPUT"
+          echo "Regression plans:   $reg_json"
+          echo "Repro (Linux gate): $linux_json"
+          echo "Repro (Win gate):   $windows_json"
 
   # ── Build VSIX (Linux) ──────────────────────────────────
   build-linux:
@@ -316,7 +327,7 @@ jobs:
   build-base-linux:
     name: Build base VSIX (Linux)
     needs: [ discover-plans ]
-    if: ${{ github.event_name == 'pull_request' && needs.discover-plans.outputs.has_repro == 'true' }}
+    if: ${{ github.event_name == 'pull_request' && needs.discover-plans.outputs.has_repro_linux == 'true' }}
     runs-on: ubuntu-latest
     timeout-minutes: 20
     steps:
@@ -357,7 +368,7 @@ jobs:
   build-base-windows:
     name: Build base VSIX (Windows)
     needs: [ discover-plans ]
-    if: ${{ github.event_name == 'pull_request' && needs.discover-plans.outputs.has_repro == 'true' }}
+    if: ${{ github.event_name == 'pull_request' && needs.discover-plans.outputs.has_repro_windows == 'true' }}
     runs-on: windows-latest
     timeout-minutes: 20
     steps:
@@ -401,13 +412,13 @@ jobs:
   repro-gate-linux:
     name: Repro Gate Linux (${{ matrix.plan }})
     needs: [ build-linux, build-base-linux, discover-plans ]
-    if: ${{ github.event_name == 'pull_request' && needs.discover-plans.outputs.has_repro == 'true' }}
+    if: ${{ github.event_name == 'pull_request' && needs.discover-plans.outputs.has_repro_linux == 'true' }}
     runs-on: ubuntu-latest
     timeout-minutes: 40
     strategy:
       fail-fast: false
       matrix:
-        plan: ${{ fromJson(needs.discover-plans.outputs.repro) }}
+        plan: ${{ fromJson(needs.discover-plans.outputs.repro_linux) }}
 
     steps:
       - uses: actions/checkout@v4   # head checkout provides the repro plan yaml
@@ -481,13 +492,13 @@ jobs:
   repro-gate-windows:
     name: Repro Gate Windows (${{ matrix.plan }})
     needs: [ build-windows, build-base-windows, discover-plans ]
-    if: ${{ github.event_name == 'pull_request' && needs.discover-plans.outputs.has_repro == 'true' }}
+    if: ${{ github.event_name == 'pull_request' && needs.discover-plans.outputs.has_repro_windows == 'true' }}
     runs-on: windows-latest
     timeout-minutes: 40
     strategy:
       fail-fast: false
       matrix:
-        plan: ${{ fromJson(needs.discover-plans.outputs.repro) }}
+        plan: ${{ fromJson(needs.discover-plans.outputs.repro_windows) }}
 
     steps:
       - uses: actions/checkout@v4

From 9000d9473aa979d4004d54da6df10d9e5d55b99e Mon Sep 17 00:00:00 2001
From: wenytang-ms <wenyutang@microsoft.com>
Date: Fri, 3 Jul 2026 09:14:58 +0800
Subject: [PATCH 08/10] fix(ci): make discover-plans tolerate an empty per-OS
 repro list

An OS-specific repro plan (e.g. repro-issue-<n>-windows.yaml) leaves the
opposite OS list empty. The old to_json() piped through 'grep -v' which
exits 1 when it filters everything out; under 'bash -eo pipefail' that
aborted the whole discover-plans step, so build-base/e2e/repro-gate were
all skipped and the red->green gate never ran. to_json() now neutralises
grep's empty-match exit and emits '[]' for empty input.
---
 .github/workflows/e2eUI.yml | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/.github/workflows/e2eUI.yml b/.github/workflows/e2eUI.yml
index 9526cbbc..2031b8ab 100644
--- a/.github/workflows/e2eUI.yml
+++ b/.github/workflows/e2eUI.yml
@@ -116,7 +116,15 @@ jobs:
           repro_linux=$(printf '%s\n' "$repro"   | grep -v -- '-windows$' || true)
           repro_windows=$(printf '%s\n' "$repro" | grep -v -- '-linux$'   || true)
 
-          to_json() { printf '%s\n' "$1" | grep -v '^[[:space:]]*$' | jq -R . | jq -sc .; }
+          to_json() {
+            local cleaned
+            cleaned=$(printf '%s\n' "$1" | grep -v '^[[:space:]]*$' || true)
+            if [ -z "$cleaned" ]; then
+              echo '[]'
+            else
+              printf '%s\n' "$cleaned" | jq -R . | jq -sc .
+            fi
+          }
           reg_json=$(to_json "$regression")
           linux_json=$(to_json "$repro_linux")
           windows_json=$(to_json "$repro_windows")

From acb0208422974fc8ccdc74c7966efc133f75f803 Mon Sep 17 00:00:00 2001
From: wenytang-ms <wenyutang@microsoft.com>
Date: Fri, 3 Jul 2026 10:33:16 +0800
Subject: [PATCH 09/10] docs(repro): make agent self-run the primary loop; CI
 is an OS-specific fallback

Reframe the bug-repro workflow around a convergent, evidence-producing loop
that Copilot closes in its OWN environment, instead of treating the machine
red->green CI gate as a mandatory authority:

- Primary proof surface = the agent's own env. Reproduce red, fix, run green,
  and ITERATE until green is observed. Distinguish a deterministic assertion
  'fail' (fix wrong -> read results.json and iterate) from a crash/error
  (flaky/infra -> re-run, never a repro signal). Escalate to a maintainer
  with evidence if a plausibly-correct fix stays red on a harness variant,
  rather than faking green.
- CI is demoted to (a) the execution surface for OS-specific plans the agent
  OS cannot reproduce (e.g. Windows-only on a Linux agent) and (b) an
  independent re-run + the always-on java-dep-* regression net. Documented how
  the agent reads CI back with gh run watch/download to keep iterating, and
  that action_required approval is expected and does not block the self-run
  loop.
- Evidence: commit only the two decisive before/after screenshots under
  test/e2e-evidence/<issue>/; raw test-results/ is now git-ignored so agents
  stop committing whole run dumps.

Updates .gitignore, .github/skills/repro/SKILL.md, .github/copilot-instructions.md.
---
 .github/copilot-instructions.md |  6 +--
 .github/skills/repro/SKILL.md   | 67 ++++++++++++++++++++++-----------
 .gitignore                      |  1 +
 3 files changed, 48 insertions(+), 26 deletions(-)

diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md
index 1df6670c..293b7413 100644
--- a/.github/copilot-instructions.md
+++ b/.github/copilot-instructions.md
@@ -3,11 +3,11 @@
 ## Bug reproduction
 
 - **Classify the task first — the repro / UI-test flow is opt-in, not automatic.** Use the `repro` skill **only** when the task is to fix or confirm a **reproducible bug** (an issue that carries repro steps + a project, or you are explicitly asked to reproduce/confirm a report). For everything else — new features, refactors, performance work, dependency/version bumps, docs, config, CI, or code cleanup — make a normal PR with the appropriate unit/integration tests and **do not** author a `test/e2e-plans/repro-issue-*.yaml`. No repro plan file means the CI red→green gate never triggers; nothing extra runs.
-- **What always runs vs what is opt-in:** every PR to `main` still gets lint + the existing `java-dep-*` regression E2E (unchanged safety net). The red→green **gate is additional and fires only when the PR contains a `repro-issue-<n>.yaml`.** So the decision to enter this flow is made purely by whether you commit a repro plan.
+- **What always runs vs what is opt-in:** every PR to `main` still gets lint + the existing `java-dep-*` regression E2E (unchanged safety net). The red→green **gate is additional and fires only when the PR contains a `repro-issue-<n>.yaml`.** So the decision to enter this flow is made purely by whether you commit a repro plan. The gate is an **independent re-run and the execution surface for OS-specific plans** — your own run in the agent environment is the primary proof.
 - If a report is **not reproducible** (vague, missing project, environment- or hardware-specific, depends on an external service), do **not** force a reproduction or invent a plan: ask for a minimal repro and label `needs-more-info`, or fix with the best available non-UI test and say so.
 - When you have decided the task **is** a reproducible bug: first decide whether it needs a UI/E2E test. Use an AutoTest plan (`uitest` skill) for user-facing surfaces (Java Projects tree, context menus, commands, classpath, export jar, view modes). Use a `test/maven-suite` unit test or a `jdtls.ext` test for pure logic, backend, or build/packaging bugs.
 - Reproduce with the reporter's project: clone the linked repo as a sibling or recreate the zip/inline sources, then distill it to a **minimal committed fixture**. Do not commit whole user projects or large binaries.
-- Author the reproduction so it fails on the current build and passes after the fix, and leave it committed as a regression test (a new `test/e2e-plans/repro-issue-<n>.yaml` is picked up by CI automatically). Commit the repro plan and the fix **together in one PR** — CI's red→green gate rebuilds the PR base to prove the red, so you never push a knowingly-broken commit.
+- **Prove the red→green in your own environment first — CI is a fallback, not a requirement.** Run the plan/test yourself: red on the un-fixed build, green after the fix, **iterating until you observe green** (crash/error = flaky, re-run; assertion-`fail` = fix wrong/incomplete, read `results.json` and iterate). Leave it committed as a regression test (`test/e2e-plans/repro-issue-<n>.yaml` is picked up by CI automatically), and commit the repro plan and the fix **together in one PR**. For an **OS-specific** bug your agent OS cannot reproduce (e.g. a Windows-only bug on a Linux agent), name the plan `repro-issue-<n>-windows.yaml` / `-linux.yaml` and let CI's red→green gate run it on that OS; read the result back with `gh run watch` / `gh run download` and iterate.
 - If no reproducible project is provided and the bug is environment-specific, ask for one and label `needs-more-info` — do not fabricate a fix for an unreproduced bug.
 
 ## UI and E2E tests
@@ -16,4 +16,4 @@
 - Use the `uitest` skill for UI test work. It should create or update `test/e2e-plans/*.yaml`, validate the plan, build the OSGi bundle and package the extension when needed, run AutoTest, and inspect `test-results/`.
 - Do not create legacy VS Code extension tests (`test/maven-suite`, `test/gui`) for UI coverage unless the user explicitly asks for that format.
 - Prefer deterministic AutoTest verifiers (`verifyTreeItem`, `verifyFile`, `verifyEditorTab`, `verifyClipboard`) on the decisive assertion step; you do not need a verifier on every step. Screenshots prove a fix (a red run before, a green run after) — but never as the sole pass/fail authority for the decisive assertion.
-- Do not attach screenshots by hand: `.github/workflows/e2eUI.yml` runs each `test/e2e-plans/*.yaml` on Linux + Windows and uploads the full `test-results/` (screenshots + `results.json`) as artifacts. For a `repro-issue-<n>.yaml`, a **red→green gate** additionally rebuilds the PR base (un-fixed) and runs the plan against base **and** head, requiring `base ❌ RED → head ✅ GREEN`; its verdict + `repro-gate-results-<os>-<plan>` artifacts are the authoritative fix-proof. Ordinary regression plans upload `e2e-results-<os>-<plan>`. Reference the relevant artifacts as the fix-proof in a PR.
+- **Evidence: commit the two decisive screenshots; never commit raw `test-results/` (it is git-ignored).** For a self-run repro, copy the before (red) / after (green) of the decisive step into `test/e2e-evidence/repro-issue-<n>/` and reference them in the PR. `.github/workflows/e2eUI.yml` additionally runs each `test/e2e-plans/*.yaml` on Linux + Windows and uploads full `test-results/` as artifacts; for a `repro-issue-<n>.yaml` an OS-aware **red→green gate** rebuilds the PR base and runs the plan against base **and** head on the OS(es) the suffix implies, requiring `base ❌ RED → head ✅ GREEN` (`repro-gate-results-<os>-<plan>` artifacts). Use those for an OS-specific plan you could not run yourself; ordinary regression plans upload `e2e-results-<os>-<plan>`.
diff --git a/.github/skills/repro/SKILL.md b/.github/skills/repro/SKILL.md
index 81f77b50..d06868af 100644
--- a/.github/skills/repro/SKILL.md
+++ b/.github/skills/repro/SKILL.md
@@ -25,7 +25,7 @@ The reproduction and the fix-proof are two different questions — decide each:
 
 - **Reproduction** can often be non-UI or even a code read, especially for simple, obvious bugs. Prefer the cheapest reproduction that captures the report.
 - **Fix-proof** is where a UI/E2E test earns its cost: a red run before the fix and a green run after, with screenshots, is the strongest evidence for a user-facing bug. If the bug is user-facing, favour leaving a committed UI plan even when you first reproduced it another way.
-- **The red→green is proven by CI, not by prose.** When you commit a `test/e2e-plans/repro-issue-<n>.yaml`, `.github/workflows/e2eUI.yml` runs a **red→green gate** (see §5) that rebuilds the PR's base (un-fixed) code and runs your plan against base **and** head, requiring `base = RED, head = GREEN`. So your job is to author a plan whose decisive assertion **fails on the un-fixed build and passes on the fix** — the gate does the proving. Do not merely assert red→green in the PR body; make the plan actually reproduce.
+- **Prove the red→green with an actual run — first in your own environment.** Your **default proof surface is the agent's own environment**: build the product, run the plan/test yourself, and observe the decisive assertion **fail on the un-fixed code and pass on the fix**. That is the closed loop — no CI approval, and you see the screenshots directly (see §4/§5). CI is a **fallback only for OS-specific bugs your environment cannot reproduce** (e.g. a Windows-only bug when the agent runs on Linux) and an always-on regression net — it is **not** a required step for every repro. Never merely assert red→green in the PR body; make the plan actually reproduce, and **iterate until you have observed it go green**.
 
 **Use a UI/E2E AutoTest plan (`uitest` skill) when the bug is in the user-facing surface**, e.g.:
 
@@ -72,6 +72,8 @@ Keep the committed footprint small and CI-reproducible:
 
 ## 4. Reproduce
 
+**This whole step runs in your own environment — no CI needed.** Reproduce, fix, and prove the fix by running the plan/test yourself; CI only re-proves OS-specific cases (§5). VS Code is pre-warmed in the agent, so the local UI loop is fast.
+
 **UI path** — create `test/e2e-plans/repro-issue-<n>.yaml` following the `uitest` skill and `.github/instructions/uitest-plan.instructions.md`:
 
 ```powershell
@@ -81,11 +83,11 @@ npx @vscode/vsce package -o vscode-java-dependency.vsix
 npx -y @vscjava/vscode-autotest run test\e2e-plans\repro-issue-<n>.yaml --vsix vscode-java-dependency.vsix --no-llm --output test-results\repro-issue-<n>
 ```
 
-**Name the plan for the OS the bug affects** — the red→green gate (§5) keys off the filename suffix:
+**If the bug is OS-specific, name the plan for that OS** — you may not be able to reproduce it in your own environment at all (e.g. a Windows-only bug while the agent runs on Linux). The filename suffix routes the **CI fallback** (§5) to the right OS:
 
-- `repro-issue-<n>-windows.yaml` — a **Windows-only** bug (e.g. drive-letter / path-separator / `\`-vs-`/` issues). The gate runs it on **Windows only**; the Linux gate skips it (the bug does not manifest there, so a Linux run would spuriously report `NOT_REPRODUCED`).
-- `repro-issue-<n>-linux.yaml` — a **Linux-only** bug. Windows gate skips it.
-- `repro-issue-<n>.yaml` — an **OS-agnostic** bug. The gate runs it on **both** Linux and Windows and both must go red→green.
+- `repro-issue-<n>-windows.yaml` — a **Windows-only** bug (e.g. drive-letter / path-separator / `\`-vs-`/` issues). CI runs it on **Windows only**; the Linux gate skips it (the bug does not manifest there, so a Linux run would spuriously report `NOT_REPRODUCED`). A Linux agent cannot prove this one itself — reproduce by reasoning + code read, commit the `-windows` plan, and let CI (§5) run the red→green on Windows.
+- `repro-issue-<n>-linux.yaml` — a **Linux-only** bug. CI's Windows gate skips it. A Linux agent **can** reproduce this one itself.
+- `repro-issue-<n>.yaml` — an **OS-agnostic** bug. You can reproduce and prove it entirely in your own environment; CI additionally re-runs it on **both** OSes as a regression net.
 
 Pick the suffix from the report's platform: if the issue only reproduces on one OS, use that OS's suffix; only use the plain name when you have confirmed the bug is platform-independent.
 
@@ -95,35 +97,54 @@ Author the plan step-by-step for the **actions**, but you do not need a verifier
 
 **Non-UI path** — add the failing `test/maven-suite` or `jdtls.ext` test and run the existing suite (`npm test`, or the `jdtls.ext` Maven test) to confirm it fails.
 
-## 5. Fix, then prove it
+## 5. Fix, then prove it — iterate until green
 
 1. Fix the product code (`src/**` for TS, `jdtls.ext/**` for the OSGi backend).
 2. **Rebuild and repackage the VSIX** (`npm run build-server` + `vsce package`) before rerunning any UI plan — never rerun against a stale VSIX.
-3. Rerun the reproduction; the same plan/test must now pass (red → green).
-4. Capture both runs' evidence: the **before** (red) and **after** (green) results. The green run is the primary proof the fix works. You do **not** need to attach images by hand — when the plan is on the PR, `.github/workflows/e2eUI.yml` re-runs it on Linux + Windows and uploads the full `test-results/` (screenshots + `results.json`) as artifacts. Link those in the PR and paste the `results.json` reason from your own red run.
-5. Leave the reproduction committed as a permanent regression test. `.github/workflows/e2eUI.yml` discovers `test/e2e-plans/*.yaml` automatically, so `repro-issue-<n>.yaml` becomes its own CI check with no workflow edits.
+3. Rerun the reproduction **in your own environment**; the same plan/test must now pass (red → green).
+4. **Iterate until you observe green** — follow the convergent loop below.
+5. **Capture evidence.** Raw `test-results/` is **git-ignored — never commit it.** Deliver the proof one of two ways:
+   - **Self-run path** (you reproduced it yourself): copy only the **two decisive screenshots** — the before (red) and after (green) of the decisive step — into `test/e2e-evidence/repro-issue-<n>/` and reference them inline in the PR body, plus the red run's `results.json` failure reason. Keep it to those two images; never commit the whole run.
+   - **CI-fallback path** (OS-specific, see below): the screenshots live in the uploaded `repro-gate-results-<os>-<plan>` artifact — reference the artifact and the gate verdict; nothing is committed.
+6. Leave the reproduction committed as a permanent regression test. `.github/workflows/e2eUI.yml` discovers `test/e2e-plans/*.yaml` automatically, so `repro-issue-<n>.yaml` becomes its own CI check with no workflow edits.
+
+### Iterate until green (the convergent loop)
+
+After each build+run, read `test-results/repro-issue-<n>/results.json` and the decisive step's screenshot, then branch:
+
+- **Head GREEN (and base was RED)** → done; you have proven the fix. Go to evidence (step 5).
+- **Head still a deterministic assertion `fail`** → the fix is wrong or incomplete. Read the *actual* observed state in `results.json` (e.g. the clipboard text, the tree label) — it tells you what the code really produced. Form a new hypothesis, adjust the fix (or the plan, if it asserts the wrong thing), rebuild, and rerun.
+- **`error` / `crash` (not a clean `fail`)** → treat as a **flaky/infra result, not a repro signal**: the language server may not have become ready, the tree may not have loaded, or the editor may not have launched. Increase `waitFor`/`timeout`, add a settle step, and **re-run** — never conclude anything about the bug from a crash/error. (This is exactly how a Linux run of a `-windows` plan fails: an env error, not a reproduction.)
+
+Repeat build→run→analyze until head is green. If after several honest iterations the fix is plausibly correct but the plan still fails only because of a harness/environment variant (e.g. the fixture runs from a `%TEMP%` worktree whose path form differs from a real install), do **not** force it: escalate to a maintainer with the evidence and your analysis, and label `needs-human-review`. A loop that stops with an explained blocker beats a green you faked.
 
-### The CI red→green gate (authoritative proof)
+### CI: the OS-specific fallback and independent re-run
 
-A regression plan run once only ever proves GREEN on the fixed code. So for a `repro-issue-<n>.yaml`, `.github/workflows/e2eUI.yml` runs a dedicated **red→green gate** that is the authoritative machine proof — you do **not** have to reproduce the red→green in the PR body by argument:
+Your own run is the primary proof. CI adds two things on top — **neither is required for an OS-agnostic bug you already proved locally**:
+
+1. **The execution surface you may lack.** For an OS-specific plan (`-windows` / `-linux`) that your agent OS cannot reproduce, CI is where the red→green actually runs. Commit the plan + fix; on the PR, `.github/workflows/e2eUI.yml` rebuilds the base (un-fixed) VSIX and runs the plan against base **and** head on that OS, and `.github/scripts/repro-gate.js` requires `base ❌ RED → head ✅ GREEN`.
+2. **An independent re-run** that does not trust your committed artifacts, plus the always-on `java-dep-*` regression net. After merge (push to `main`) the base already contains the fix, so the plan is demoted to an ordinary GREEN regression check.
+
+Gate verdicts (treat them exactly like your own run): `NOT_REPRODUCED` (plan passed on the un-fixed base — tighten the decisive assertion to the **expected** behaviour), `NOT_FIXED` (head still fails — read the head `results.json` and iterate), `INCONCLUSIVE` (base or head crashed/errored — flaky, re-run). The `base ❌ RED → head ✅ GREEN` verdict + `repro-gate-results-<os>-<plan>` artifacts are the machine fix-proof for OS-specific plans.
+
+**Read CI back to close the loop from the agent** — pull the result and iterate without leaving the session:
+
+```bash
+rid=$(gh run list --branch "$BRANCH" --workflow "E2E UI Tests" -L1 --json databaseId -q '.[0].databaseId')
+gh run watch "$rid" || true
+gh run download "$rid" -n "repro-gate-results-windows-repro-issue-<n>" -D ci-evidence/
+# read ci-evidence/**/results.json + view the decisive screenshot, then fix/plan and push again
+```
 
-- On a pull request, the gate **rebuilds the PR's base commit** (`main`, before your fix) into its own VSIX, then runs your repro plan against **both** builds in one CI run, on the OS(es) implied by the filename suffix (`-windows` / `-linux` / none = both, see §4):
-  - **base (un-fixed) → must be ❌ RED** — a deterministic assertion `fail` (not a crash/error), proving the plan reproduces the bug.
-  - **head (fix) → must be ✅ GREEN** — all steps pass, proving the fix works.
-- `.github/scripts/repro-gate.js` reads both `results.json` files and passes the check only for `base RED && head GREEN`. It fails with a clear verdict otherwise:
-  - `NOT_REPRODUCED` — your plan passed on the un-fixed base, so it does **not** capture the bug. Tighten the decisive assertion so it asserts the **expected** behaviour.
-  - `NOT_FIXED` — head still fails; the bug is not resolved.
-  - `INCONCLUSIVE` — base or head crashed/errored (infra flake); re-run the job.
-- The gate's verdict table (`base ❌ RED → head ✅ GREEN`) is written to the job summary, and both runs' `test-results/` are uploaded as `repro-gate-results-<os>-<plan>` artifacts (screenshots + `results.json`). **This is the fix-proof** — reference it in the PR.
-- The gate runs only on `pull_request` events. After merge (push to `main`) the base already contains the fix, so the same plan is demoted to an ordinary GREEN regression check.
+> **Approval note:** CI on a Copilot-authored PR may sit in `action_required` until a maintainer clicks **Approve and run**. That is expected and it does **not** block the self-run loop (which needs no CI). For an OS-specific bug, ask the maintainer to approve once, then read the result back as above.
 
-Because CI reconstructs the red from the base commit, your PR stays a single clean PR — **commit the repro plan and the fix together**; you never have to push a knowingly-broken commit to demonstrate the red.
+Because CI reconstructs the red from the base commit, your PR stays a single clean PR — **commit the repro plan and the fix together**; you never push a knowingly-broken commit.
 
 ## 6. Report back
 
 Every PR or comment must state **how you reproduced** (UI plan vs unit test vs code read) and the **execution status** (ran red→green, or could not execute — and why). Never claim a green run you did not observe.
 
-- **Reproduced + fixed**: open a **single PR containing the repro plan and the fix together**, and let the red→green gate (§5) prove it. In the PR body, reference the gate's `repro-gate-results-<os>-<plan>` artifacts and its `base ❌ RED → head ✅ GREEN` verdict, and cite the failing step / `results.json` reason from your own local red run. Reference the issue.
+- **Reproduced + fixed**: open a **single PR containing the repro plan and the fix together**. State that you ran it red→green **in your own environment**, and show the proof: the two decisive before/after screenshots you committed under `test/e2e-evidence/repro-issue-<n>/` (self-run path) or — for an OS-specific plan you could not run yourself — the CI gate's `repro-gate-results-<os>-<plan>` artifacts and its `base ❌ RED → head ✅ GREEN` verdict. Cite the failing step / `results.json` reason from your red run. Reference the issue.
 - **Reproduced, report only**: comment with the reproduction (plan or test), the observed vs expected behavior, and the exact failing step.
 - **Reproduced but could not run the UI test**: remember a `(dns block)` on `update.code.visualstudio.com` is expected and non-fatal (see Environment notes) — it is **not** a reason to skip the UI path. Only if the editor genuinely never launches, commit the plan, explain the real failure, and fall back to a non-UI proof or ask a maintainer to unblock.
 - **Could not reproduce**: comment with what you tried and precisely what is missing; label `needs-more-info`. Do not fabricate a fix for an unreproduced bug.
@@ -134,7 +155,7 @@ Every PR or comment must state **how you reproduced** (UI plan vs unit test vs c
 - That setup runs **before the agent firewall**, and its final step pre-downloads the **latest** VS Code (`stable`) and the `vscjava.vscode-java-pack` extensions into AutoTest's `<repo>/.vscode-test` cache (via `.github/scripts/prewarm-vscode.js`). Keep the plans on `vscodeVersion: "stable"` (do **not** pin a version) — `stable` always means the current latest release, and it is exactly what the pre-warm cached.
 - **A `(dns block)` on `update.code.visualstudio.com` at run time is EXPECTED and NON-FATAL — do not treat it as a UI-test failure or abandon the UI path.** AutoTest re-resolves `stable` over the network at launch; the firewall blocks that, but `@vscode/test-electron` catches it and **falls back to the already-cached latest VS Code**, and the Java extensions are already installed in `.vscode-test/extensions`. So the editor still launches offline. VS Code's own telemetry/Marketplace DNS calls are blocked too and are equally harmless.
 - Only if the pre-warm genuinely did not run (e.g. an older branch, or a cold `.vscode-test` with no cached build) will the UI run actually fail to launch. In that case fall back to the non-UI path and note the limitation.
-- **Screenshots / results are captured for you by CI, not by hand.** When a `repro-issue-<n>.yaml` lands on a PR (base `main`), the red→green gate (§5) runs it on Linux **and** Windows against the base and head builds and uploads the whole `test-results/` directory (screenshots + `results.json` for both the base RED and head GREEN runs) as `repro-gate-results-<os>-<plan>` artifacts, plus a `base ❌ RED → head ✅ GREEN` verdict in the job summary. In the PR body, reference those artifacts and the verdict as the fix-proof, and paste the `results.json` failure reason from your own local red run — you do not need to attach images manually. (Ordinary `java-dep-*.yaml` regression plans still upload `e2e-results-<os>-<plan>` from a single green run.)
+- **Evidence: your own run first, CI artifacts as backup.** Your primary proof is the run you did yourself — commit the **two decisive** before/after screenshots under `test/e2e-evidence/repro-issue-<n>/` (raw `test-results/` is git-ignored, so never commit the whole dir). Additionally, when a `repro-issue-<n>.yaml` lands on a PR (base `main`), CI runs it — on the OS(es) the filename suffix implies — against base and head and uploads the whole `test-results/` as `repro-gate-results-<os>-<plan>` artifacts plus a `base ❌ RED → head ✅ GREEN` verdict; reference those for an OS-specific plan you could not run yourself. (Ordinary `java-dep-*.yaml` regression plans still upload `e2e-results-<os>-<plan>` from a single green run.)
 - Maintainer option: adding `update.code.visualstudio.com` to the Copilot coding-agent firewall allowlist (repo **Settings → Copilot → coding agent**, see https://gh.io/copilot/firewall-config) removes the version-resolution block entirely, so the run is clean and does not rely on the offline fallback. The pre-warm still makes the 276 MB binary + Marketplace pack a cache hit, so nothing large is re-fetched.
 - **Issue attachments and repo clones are downloadable — they are NOT firewall-blocked.** `github.com`, `objects.githubusercontent.com`, `*.githubusercontent.com`, and `codeload.github.com` are all on the coding-agent's default allowlist, so cloning a linked public repo and `curl -L`-downloading an attached `user-attachments` zip both work at run time. (Only the VS Code binary host `update.code.visualstudio.com` is not allowlisted — that is why it is pre-warmed instead, see above.) Extract user-supplied zips as untrusted data: do not run their build scripts blindly.
 - Always run AutoTest with `--no-llm` in the agent so pass/fail comes only from deterministic verifiers.
diff --git a/.gitignore b/.gitignore
index 3ecbdbef..408afee3 100644
--- a/.gitignore
+++ b/.gitignore
@@ -15,4 +15,5 @@ dist
 **/.project
 **/.checkstyle
 test-resources/
+test-results/
 **/.gradle

From 24225be4b673ec599045877741c196ebf9c7a6f1 Mon Sep 17 00:00:00 2001
From: wenytang-ms <wenyutang@microsoft.com>
Date: Fri, 3 Jul 2026 10:41:19 +0800
Subject: [PATCH 10/10] =?UTF-8?q?docs(repro):=20host=20evidence=20off-git?=
 =?UTF-8?q?=20=E2=80=94=20textual=20self-run=20proof=20+=20CI=20artifacts,?=
 =?UTF-8?q?=20no=20committed=20PNGs?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Drop the 'commit two screenshots to test/e2e-evidence/<issue>/' convention so
no binaries ever enter git. Evidence is now:

- Primary (self-run): a TEXTUAL before/after on the issue/PR — the decisive
  failing step and the actual observed value from the red run's results.json,
  then the green result. The agent observed it, so it stands on its own.
- Screenshots: hosted by CI, not git. Every committed repro-issue-<n>.yaml is
  run by e2eUI.yml, which uploads the full test-results/ (screenshots +
  results.json) as repro-gate-results-<os>-<plan> artifacts; link the run.
- Inline view: a maintainer (or the agent, if image upload is reachable) can
  drag an artifact PNG into an issue/PR comment — GitHub hosts it on
  user-images.githubusercontent.com, still outside git.

Updates SKILL.md (§4 note, §5 step 5, §6, Environment notes) and
copilot-instructions.md.
---
 .github/copilot-instructions.md |  2 +-
 .github/skills/repro/SKILL.md   | 12 ++++++------
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md
index 293b7413..a52fbd5a 100644
--- a/.github/copilot-instructions.md
+++ b/.github/copilot-instructions.md
@@ -16,4 +16,4 @@
 - Use the `uitest` skill for UI test work. It should create or update `test/e2e-plans/*.yaml`, validate the plan, build the OSGi bundle and package the extension when needed, run AutoTest, and inspect `test-results/`.
 - Do not create legacy VS Code extension tests (`test/maven-suite`, `test/gui`) for UI coverage unless the user explicitly asks for that format.
 - Prefer deterministic AutoTest verifiers (`verifyTreeItem`, `verifyFile`, `verifyEditorTab`, `verifyClipboard`) on the decisive assertion step; you do not need a verifier on every step. Screenshots prove a fix (a red run before, a green run after) — but never as the sole pass/fail authority for the decisive assertion.
-- **Evidence: commit the two decisive screenshots; never commit raw `test-results/` (it is git-ignored).** For a self-run repro, copy the before (red) / after (green) of the decisive step into `test/e2e-evidence/repro-issue-<n>/` and reference them in the PR. `.github/workflows/e2eUI.yml` additionally runs each `test/e2e-plans/*.yaml` on Linux + Windows and uploads full `test-results/` as artifacts; for a `repro-issue-<n>.yaml` an OS-aware **red→green gate** rebuilds the PR base and runs the plan against base **and** head on the OS(es) the suffix implies, requiring `base ❌ RED → head ✅ GREEN` (`repro-gate-results-<os>-<plan>` artifacts). Use those for an OS-specific plan you could not run yourself; ordinary regression plans upload `e2e-results-<os>-<plan>`.
+- **Evidence: textual self-run proof + CI-hosted screenshots; never commit binaries.** Raw `test-results/` is git-ignored and screenshots are **never committed to the repo**. For a self-run repro, prove red→green as text on the issue/PR: the decisive failing step and the **actual observed value** from the red run's `results.json`, then the green result. `.github/workflows/e2eUI.yml` runs each `test/e2e-plans/*.yaml` on Linux + Windows and uploads full `test-results/` (screenshots + `results.json`) as artifacts; for a `repro-issue-<n>.yaml` an OS-aware **red→green gate** rebuilds the PR base and runs the plan against base **and** head on the OS(es) the suffix implies, requiring `base ❌ RED → head ✅ GREEN` (`repro-gate-results-<os>-<plan>` artifacts) — link those for the images and for an OS-specific plan you could not run yourself; ordinary regression plans upload `e2e-results-<os>-<plan>`. A maintainer can drag an artifact PNG into a comment for an inline view (`user-images.githubusercontent.com`), still outside git.
diff --git a/.github/skills/repro/SKILL.md b/.github/skills/repro/SKILL.md
index d06868af..d6a35790 100644
--- a/.github/skills/repro/SKILL.md
+++ b/.github/skills/repro/SKILL.md
@@ -91,7 +91,7 @@ npx -y @vscjava/vscode-autotest run test\e2e-plans\repro-issue-<n>.yaml --vsix v
 
 Pick the suffix from the report's platform: if the issue only reproduces on one OS, use that OS's suffix; only use the plain name when you have confirmed the bug is platform-independent.
 
-Author the plan step-by-step for the **actions**, but you do not need a verifier on every step — put a deterministic verifier (`verifyTreeItem` / `verifyFile` / `verifyEditorTab` / `verifyClipboard`) on the **decisive assertion step** (the one that captures the bug) and on any step prone to a silent no-op. That decisive verifier must assert the **expected** behavior, so it **fails on the current (buggy) build**. Inspect `test-results/repro-issue-<n>/results.json` and the screenshots to confirm the failure matches the report, and keep the red-run screenshot as before-fix evidence.
+Author the plan step-by-step for the **actions**, but you do not need a verifier on every step — put a deterministic verifier (`verifyTreeItem` / `verifyFile` / `verifyEditorTab` / `verifyClipboard`) on the **decisive assertion step** (the one that captures the bug) and on any step prone to a silent no-op. That decisive verifier must assert the **expected** behavior, so it **fails on the current (buggy) build**. Inspect `test-results/repro-issue-<n>/results.json` and the screenshots to confirm the failure matches the report, and record the failing step + the **actual observed value** as before-fix evidence (the screenshots stay in the git-ignored `test-results/`; CI hosts them as an artifact, see §5 — never commit them).
 
 **Run this on the un-fixed checkout FIRST — see RED before you write the fix.** That is the whole point of the reproduction: build + run the plan against the current (buggy) product code and confirm the decisive verifier fails with the reported symptom. Only then move to §5 and write the fix. This local red→green loop is fast in the agent env (VS Code is pre-warmed) and is what gives you confidence the plan actually reproduces before CI re-proves it.
 
@@ -103,9 +103,9 @@ Author the plan step-by-step for the **actions**, but you do not need a verifier
 2. **Rebuild and repackage the VSIX** (`npm run build-server` + `vsce package`) before rerunning any UI plan — never rerun against a stale VSIX.
 3. Rerun the reproduction **in your own environment**; the same plan/test must now pass (red → green).
 4. **Iterate until you observe green** — follow the convergent loop below.
-5. **Capture evidence.** Raw `test-results/` is **git-ignored — never commit it.** Deliver the proof one of two ways:
-   - **Self-run path** (you reproduced it yourself): copy only the **two decisive screenshots** — the before (red) and after (green) of the decisive step — into `test/e2e-evidence/repro-issue-<n>/` and reference them inline in the PR body, plus the red run's `results.json` failure reason. Keep it to those two images; never commit the whole run.
-   - **CI-fallback path** (OS-specific, see below): the screenshots live in the uploaded `repro-gate-results-<os>-<plan>` artifact — reference the artifact and the gate verdict; nothing is committed.
+5. **Capture evidence — keep binaries out of git.** Raw `test-results/` is **git-ignored**, and screenshots are **never committed to the repo**. Prove it two ways instead:
+   - **Textual before/after on the issue/PR (always — this is your primary proof).** Quote the red run's `results.json`: the decisive failing step and the **actual observed value** it produced (e.g. the clipboard text, the tree label), then the after-fix green result. Because you observed this yourself, it stands on its own.
+   - **Screenshots, GitHub-hosted, not in git.** Every committed `repro-issue-<n>.yaml` is also run by `.github/workflows/e2eUI.yml`, which uploads the full `test-results/` (screenshots + `results.json`) as a `repro-gate-results-<os>-<plan>` artifact — link that run/artifact for the images. For an inline visual, a maintainer (or you, if image upload is reachable) can drag a PNG into an issue or PR comment; GitHub hosts it on `user-images.githubusercontent.com`, still outside git. **Never add PNGs to the repository.**
 6. Leave the reproduction committed as a permanent regression test. `.github/workflows/e2eUI.yml` discovers `test/e2e-plans/*.yaml` automatically, so `repro-issue-<n>.yaml` becomes its own CI check with no workflow edits.
 
 ### Iterate until green (the convergent loop)
@@ -144,7 +144,7 @@ Because CI reconstructs the red from the base commit, your PR stays a single cle
 
 Every PR or comment must state **how you reproduced** (UI plan vs unit test vs code read) and the **execution status** (ran red→green, or could not execute — and why). Never claim a green run you did not observe.
 
-- **Reproduced + fixed**: open a **single PR containing the repro plan and the fix together**. State that you ran it red→green **in your own environment**, and show the proof: the two decisive before/after screenshots you committed under `test/e2e-evidence/repro-issue-<n>/` (self-run path) or — for an OS-specific plan you could not run yourself — the CI gate's `repro-gate-results-<os>-<plan>` artifacts and its `base ❌ RED → head ✅ GREEN` verdict. Cite the failing step / `results.json` reason from your red run. Reference the issue.
+- **Reproduced + fixed**: open a **single PR containing the repro plan and the fix together**. State that you ran it red→green **in your own environment**, and show the proof as **text**: the decisive failing step and the **actual observed value** from your red run's `results.json`, plus the green after-fix result. For the images, link the CI `repro-gate-results-<os>-<plan>` artifact (and, for an OS-specific plan you could not run yourself, its `base ❌ RED → head ✅ GREEN` verdict). **Do not commit screenshots to the repo.** Reference the issue.
 - **Reproduced, report only**: comment with the reproduction (plan or test), the observed vs expected behavior, and the exact failing step.
 - **Reproduced but could not run the UI test**: remember a `(dns block)` on `update.code.visualstudio.com` is expected and non-fatal (see Environment notes) — it is **not** a reason to skip the UI path. Only if the editor genuinely never launches, commit the plan, explain the real failure, and fall back to a non-UI proof or ask a maintainer to unblock.
 - **Could not reproduce**: comment with what you tried and precisely what is missing; label `needs-more-info`. Do not fabricate a fix for an unreproduced bug.
@@ -155,7 +155,7 @@ Every PR or comment must state **how you reproduced** (UI plan vs unit test vs c
 - That setup runs **before the agent firewall**, and its final step pre-downloads the **latest** VS Code (`stable`) and the `vscjava.vscode-java-pack` extensions into AutoTest's `<repo>/.vscode-test` cache (via `.github/scripts/prewarm-vscode.js`). Keep the plans on `vscodeVersion: "stable"` (do **not** pin a version) — `stable` always means the current latest release, and it is exactly what the pre-warm cached.
 - **A `(dns block)` on `update.code.visualstudio.com` at run time is EXPECTED and NON-FATAL — do not treat it as a UI-test failure or abandon the UI path.** AutoTest re-resolves `stable` over the network at launch; the firewall blocks that, but `@vscode/test-electron` catches it and **falls back to the already-cached latest VS Code**, and the Java extensions are already installed in `.vscode-test/extensions`. So the editor still launches offline. VS Code's own telemetry/Marketplace DNS calls are blocked too and are equally harmless.
 - Only if the pre-warm genuinely did not run (e.g. an older branch, or a cold `.vscode-test` with no cached build) will the UI run actually fail to launch. In that case fall back to the non-UI path and note the limitation.
-- **Evidence: your own run first, CI artifacts as backup.** Your primary proof is the run you did yourself — commit the **two decisive** before/after screenshots under `test/e2e-evidence/repro-issue-<n>/` (raw `test-results/` is git-ignored, so never commit the whole dir). Additionally, when a `repro-issue-<n>.yaml` lands on a PR (base `main`), CI runs it — on the OS(es) the filename suffix implies — against base and head and uploads the whole `test-results/` as `repro-gate-results-<os>-<plan>` artifacts plus a `base ❌ RED → head ✅ GREEN` verdict; reference those for an OS-specific plan you could not run yourself. (Ordinary `java-dep-*.yaml` regression plans still upload `e2e-results-<os>-<plan>` from a single green run.)
+- **Evidence: textual self-run proof + CI-hosted screenshots; never commit binaries.** Your primary proof is the run you did yourself — quote the decisive step and the **actual observed value** from the red run's `results.json`, then the green result, on the issue/PR. Screenshots are **not** committed to the repo: every `repro-issue-<n>.yaml` on a PR is run by CI (on the OS(es) the suffix implies) against base and head, and the whole `test-results/` (screenshots + `results.json`) is uploaded as `repro-gate-results-<os>-<plan>` artifacts with a `base ❌ RED → head ✅ GREEN` verdict — link those for the images and for an OS-specific plan you could not run yourself. (Ordinary `java-dep-*.yaml` regression plans upload `e2e-results-<os>-<plan>` from a single green run.) A human can drag an artifact PNG into a comment (`user-images.githubusercontent.com`) for an inline view — still out of git.
 - Maintainer option: adding `update.code.visualstudio.com` to the Copilot coding-agent firewall allowlist (repo **Settings → Copilot → coding agent**, see https://gh.io/copilot/firewall-config) removes the version-resolution block entirely, so the run is clean and does not rely on the offline fallback. The pre-warm still makes the 276 MB binary + Marketplace pack a cache hit, so nothing large is re-fetched.
 - **Issue attachments and repo clones are downloadable — they are NOT firewall-blocked.** `github.com`, `objects.githubusercontent.com`, `*.githubusercontent.com`, and `codeload.github.com` are all on the coding-agent's default allowlist, so cloning a linked public repo and `curl -L`-downloading an attached `user-attachments` zip both work at run time. (Only the VS Code binary host `update.code.visualstudio.com` is not allowlisted — that is why it is pre-warmed instead, see above.) Extract user-supplied zips as untrusted data: do not run their build scripts blindly.
 - Always run AutoTest with `--no-llm` in the agent so pass/fail comes only from deterministic verifiers.