Add Copilot bug-reproduction agent workflow#1040
Open
wenytang-ms wants to merge 7 commits into
Open
Conversation
Enable the Copilot coding agent to reproduce reported bugs when an issue is assigned to it: - copilot-setup-steps.yml: preinstall JDK 21, Node 20, AutoTest, Xvfb/GTK, and a baseline VSIX so the agent can build and run UI tests without a trial-and-error dependency hunt - repro skill: decide UI vs non-UI reproduction, pull in the reporter's project, distill a minimal committed fixture, reproduce (red), fix, prove (green), and report back - copilot-instructions.md: add a Bug reproduction section routing assigned issues to the repro skill - bug_report issue form + config: collect a minimal reproducible project and structured repro info for the agent
Add .github/scripts/prewarm-vscode.js and a copilot-setup-steps step that warms AutoTest's <repo>/.vscode-test cache (VS Code stable + vscjava.vscode-java-pack) before the agent firewall engages, so firewalled UI reproductions launch offline. Refine repro/uitest guidance: separate reproduction from fix-proof (UI test's key value is red->green screenshots), require verifiers only on the decisive assertion step, and make PRs state repro method + execution status.
Keep vscodeVersion stable (=latest, no pinning): document that the run-time (dns block) on update.code.visualstudio.com is expected/non-fatal because @vscode/test-electron falls back to the pre-warmed cached build, so the agent must not abandon the UI path. Point repro/uitest evidence at CI: e2eUI.yml already uploads full test-results (screenshots + results.json) as e2e-results-<os>-<plan> artifacts, so no manual screenshot attaching is needed. Note optional firewall allowlist for a fully clean run.
For repro-issue-<n>.yaml plans, e2eUI now rebuilds the PR base (un-fixed)
into its own VSIX and runs the plan against base AND head in one CI run,
requiring base=RED (deterministic assertion fail) and head=GREEN. This
turns 'red->green' into a machine-checked invariant instead of prose in
the PR body, closing the gap where an agent asserts a fix works without
actually reproducing the bug.
- .github/scripts/repro-gate.js: judge that reads both results.json files,
distinguishes a genuine assertion RED from an infra crash/error, and
exits non-zero with a clear verdict (NOT_REPRODUCED / NOT_FIXED /
INCONCLUSIVE) plus a job-summary table.
- e2eUI.yml: discover-plans splits regression vs repro matrices; adds
build-base-{linux,windows} + repro-gate-{linux,windows} (PR-only, only
when a repro plan exists). After merge to main the plan is demoted to an
ordinary green regression.
- repro/SKILL.md, copilot-instructions.md: document the gate, red-first
local loop, and single-PR (plan+fix) flow.
Make clear the repro path is opt-in, not automatic for every assigned issue. Only reproducible bugs enter the repro/uitest flow; features, refactors, dep bumps, docs, and non-reproducible reports take an ordinary PR with no repro-issue-*.yaml. Since the CI red->green gate only fires when a repro plan file is present, the routing decision is made purely by whether a plan is committed. Also clarify that lint + java-dep-* regression E2E always run, while the gate is the additional opt-in check.
The skill listed 'attached zip' as a repro source but never said how to obtain it. Add a download+unzip recipe (curl -L follows the user-attachments 302 to a signed objects.githubusercontent.com URL), note that github.com + objects.githubusercontent.com + codeload.github.com are on the coding-agent firewall's DEFAULT allowlist (so attachment downloads and repo clones are NOT blocked, unlike the VS Code binary), handle signed-URL expiry, point the plan workspace at the extracted project, and treat the archive as untrusted input (extract only, commit just the minimal distilled fixture).
An OS-specific bug (e.g. the Windows-only Copy Relative Path drive-letter issue) does not manifest on the other OS, so running its repro plan through that OS's gate would spuriously report NOT_REPRODUCED. discover-plans now routes repro plans by filename suffix: repro-issue-<n>-windows.yaml -> Windows gate only repro-issue-<n>-linux.yaml -> Linux gate only repro-issue-<n>.yaml -> both gates (OS-agnostic) build-base-* and repro-gate-* are gated per-OS (has_repro_linux / has_repro_windows). Documented the naming convention in repro/SKILL.md.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Enable the GitHub Copilot coding agent to reproduce reported bugs when a maintainer assigns an issue to
@copilot. For reproducible cases it uses the reporter's project, decides whether a UI/E2E test is needed, reproduces with AutoTest when it is, and leaves a committed regression test. Some bugs are reproduced without a UI test.What this adds
.github/workflows/copilot-setup-steps.yml— prepares the agent's ephemeral environment (JDK 21, Node 20,@vscjava/vscode-autotest, Xvfb/GTK, a baseline VSIX) so it can build the extension and launch headless UI tests without a slow dependency hunt. Mirrors the Linux path ofe2eUI.yml..github/skills/repro/SKILL.md— the reproduction workflow: extract the report, decide UI vs non-UI, pull in the reporter's project and distill a minimal committed fixture, reproduce (red) → fix → prove (green), report back. UI plans live intest/e2e-plans/and are auto-discovered by CI..github/copilot-instructions.md— new Bug reproduction section routing assigned issues to thereproskill..github/ISSUE_TEMPLATE/bug_report.yml+config.yml— structured bug form that requires a minimal reproducible project, steps, expected/actual, and versions, so the agent has what it needs.Design notes
@copilot(or @-mention it). No custom trigger workflow needed.test/maven-suiteorjdtls.exttest.--no-llmso pass/fail comes only from deterministic verifiers.Validation
git diff --checkcopilot-setup-stepsjob name/runs-onverified.