add qa-flow, consolidate QA/AQA/testgen skills, add manual tests#110
add qa-flow, consolidate QA/AQA/testgen skills, add manual tests#110sveto wants to merge 39 commits into
Conversation
Rosetta Triage ReviewSummary: This PR consolidates and significantly refactors the QA, AQA, and Testgen AI agent workflows — introducing a brand-new end-to-end
Findings: [CRITICAL] [HIGH] Multiple files — Frontmatter description exceeds 30-token cap
[HIGH] [HIGH] [HIGH] [MEDIUM] DRY — Duplicated validation rules in [MEDIUM] [MEDIUM] [POSITIVE] CI improvement in [POSITIVE] Manual test docs Suggestions:
Automated triage by Rosetta agent |
# Conflicts: # instructions/r2/core/skills/debugging/SKILL.md # instructions/r2/core/skills/orchestrator-contract/SKILL.md # instructions/r2/core/skills/requirements-authoring/SKILL.md # instructions/r2/core/workflows/aqa-flow-data-collection.md # instructions/r2/core/workflows/aqa-flow-requirements-clarification.md # instructions/r2/core/workflows/aqa-flow-selector-identification.md # instructions/r2/core/workflows/aqa-flow-test-correction.md # instructions/r2/core/workflows/aqa-flow-test-implementation.md # instructions/r2/core/workflows/aqa-flow-test-report-analysis.md # instructions/r2/core/workflows/aqa-flow.md # instructions/r2/core/workflows/testgen-flow-data-collection.md # instructions/r2/core/workflows/testgen-flow-gap-and-contradiction-analysis.md # instructions/r2/core/workflows/testgen-flow-project-config-loading.md # instructions/r2/core/workflows/testgen-flow-question-generation.md # instructions/r2/core/workflows/testgen-flow-requirements-document-generation.md # instructions/r2/core/workflows/testgen-flow-test-case-export.md # instructions/r2/core/workflows/testgen-flow-test-case-generation.md # instructions/r2/core/workflows/testgen-flow.md # instructions/r3/core/skills/debugging/SKILL.md # instructions/r3/core/skills/orchestrator-contract/SKILL.md # instructions/r3/core/skills/requirements-authoring/SKILL.md # instructions/r3/core/workflows/aqa-flow-data-collection.md # instructions/r3/core/workflows/aqa-flow-requirements-clarification.md # instructions/r3/core/workflows/aqa-flow-selector-identification.md # instructions/r3/core/workflows/aqa-flow-test-correction.md # instructions/r3/core/workflows/aqa-flow-test-implementation.md # instructions/r3/core/workflows/aqa-flow-test-report-analysis.md # instructions/r3/core/workflows/aqa-flow.md # instructions/r3/core/workflows/testgen-flow-data-collection.md # instructions/r3/core/workflows/testgen-flow-gap-and-contradiction-analysis.md # instructions/r3/core/workflows/testgen-flow-project-config-loading.md # instructions/r3/core/workflows/testgen-flow-question-generation.md # instructions/r3/core/workflows/testgen-flow-requirements-document-generation.md # instructions/r3/core/workflows/testgen-flow-test-case-export.md # instructions/r3/core/workflows/testgen-flow-test-case-generation.md # instructions/r3/core/workflows/testgen-flow.md # plugins/core-claude/hooks/hooks.json # plugins/core-claude/skills/debugging/SKILL.md # plugins/core-claude/skills/orchestrator-contract/SKILL.md # plugins/core-claude/skills/requirements-authoring/SKILL.md # plugins/core-claude/workflows/INDEX.md # plugins/core-claude/workflows/aqa-flow-data-collection.md # plugins/core-claude/workflows/aqa-flow-requirements-clarification.md # plugins/core-claude/workflows/aqa-flow-selector-identification.md # plugins/core-claude/workflows/aqa-flow-test-correction.md # plugins/core-claude/workflows/aqa-flow-test-implementation.md # plugins/core-claude/workflows/aqa-flow-test-report-analysis.md # plugins/core-claude/workflows/aqa-flow.md # plugins/core-claude/workflows/testgen-flow-data-collection.md # plugins/core-claude/workflows/testgen-flow-gap-and-contradiction-analysis.md # plugins/core-claude/workflows/testgen-flow-project-config-loading.md # plugins/core-claude/workflows/testgen-flow-question-generation.md # plugins/core-claude/workflows/testgen-flow-requirements-document-generation.md # plugins/core-claude/workflows/testgen-flow-test-case-export.md # plugins/core-claude/workflows/testgen-flow-test-case-generation.md # plugins/core-claude/workflows/testgen-flow.md # plugins/core-codex/.agents/skills/debugging/SKILL.md # plugins/core-codex/.agents/skills/orchestrator-contract/SKILL.md # plugins/core-codex/.agents/skills/requirements-authoring/SKILL.md # plugins/core-codex/.agents/workflows/INDEX.md # plugins/core-codex/.agents/workflows/aqa-flow-data-collection.md # plugins/core-codex/.agents/workflows/aqa-flow-requirements-clarification.md # plugins/core-codex/.agents/workflows/aqa-flow-selector-identification.md # plugins/core-codex/.agents/workflows/aqa-flow-test-correction.md # plugins/core-codex/.agents/workflows/aqa-flow-test-implementation.md # plugins/core-codex/.agents/workflows/aqa-flow-test-report-analysis.md # plugins/core-codex/.agents/workflows/aqa-flow.md # plugins/core-codex/.agents/workflows/testgen-flow-data-collection.md # plugins/core-codex/.agents/workflows/testgen-flow-gap-and-contradiction-analysis.md # plugins/core-codex/.agents/workflows/testgen-flow-project-config-loading.md # plugins/core-codex/.agents/workflows/testgen-flow-question-generation.md # plugins/core-codex/.agents/workflows/testgen-flow-requirements-document-generation.md # plugins/core-codex/.agents/workflows/testgen-flow-test-case-export.md # plugins/core-codex/.agents/workflows/testgen-flow-test-case-generation.md # plugins/core-codex/.agents/workflows/testgen-flow.md # plugins/core-codex/.codex-plugin/hooks.json # plugins/core-codex/.codex/hooks.json # plugins/core-copilot-standalone/.github/instructions/plugin-files-mode.instructions.md # plugins/core-copilot-standalone/.github/prompts/INDEX.md # plugins/core-copilot-standalone/.github/prompts/aqa-flow-data-collection.prompt.md # plugins/core-copilot-standalone/.github/prompts/aqa-flow-requirements-clarification.prompt.md # plugins/core-copilot-standalone/.github/prompts/aqa-flow-selector-identification.prompt.md # plugins/core-copilot-standalone/.github/prompts/aqa-flow-test-correction.prompt.md # plugins/core-copilot-standalone/.github/prompts/aqa-flow-test-implementation.prompt.md # plugins/core-copilot-standalone/.github/prompts/aqa-flow-test-report-analysis.prompt.md # plugins/core-copilot-standalone/.github/prompts/aqa-flow.prompt.md # plugins/core-copilot-standalone/.github/prompts/testgen-flow-data-collection.prompt.md # plugins/core-copilot-standalone/.github/prompts/testgen-flow-gap-and-contradiction-analysis.prompt.md # plugins/core-copilot-standalone/.github/prompts/testgen-flow-project-config-loading.prompt.md # plugins/core-copilot-standalone/.github/prompts/testgen-flow-question-generation.prompt.md # plugins/core-copilot-standalone/.github/prompts/testgen-flow-requirements-document-generation.prompt.md # plugins/core-copilot-standalone/.github/prompts/testgen-flow-test-case-export.prompt.md # plugins/core-copilot-standalone/.github/prompts/testgen-flow-test-case-generation.prompt.md # plugins/core-copilot-standalone/.github/prompts/testgen-flow.prompt.md # plugins/core-copilot-standalone/.github/skills/debugging/SKILL.md # plugins/core-copilot-standalone/.github/skills/orchestrator-contract/SKILL.md # plugins/core-copilot-standalone/.github/skills/requirements-authoring/SKILL.md # plugins/core-copilot/.github/plugin/hooks.json # plugins/core-copilot/commands/INDEX.md # plugins/core-copilot/commands/aqa-flow-data-collection.md # plugins/core-copilot/commands/aqa-flow-requirements-clarification.md # plugins/core-copilot/commands/aqa-flow-selector-identification.md # plugins/core-copilot/commands/aqa-flow-test-correction.md # plugins/core-copilot/commands/aqa-flow-test-implementation.md # plugins/core-copilot/commands/aqa-flow-test-report-analysis.md # plugins/core-copilot/commands/aqa-flow.md # plugins/core-copilot/commands/testgen-flow-data-collection.md # plugins/core-copilot/commands/testgen-flow-gap-and-contradiction-analysis.md # plugins/core-copilot/commands/testgen-flow-project-config-loading.md # plugins/core-copilot/commands/testgen-flow-question-generation.md # plugins/core-copilot/commands/testgen-flow-requirements-document-generation.md # plugins/core-copilot/commands/testgen-flow-test-case-export.md # plugins/core-copilot/commands/testgen-flow-test-case-generation.md # plugins/core-copilot/commands/testgen-flow.md # plugins/core-copilot/hooks.json # plugins/core-copilot/skills/debugging/SKILL.md # plugins/core-copilot/skills/orchestrator-contract/SKILL.md # plugins/core-copilot/skills/requirements-authoring/SKILL.md # plugins/core-cursor-standalone/.cursor/commands/INDEX.md # plugins/core-cursor-standalone/.cursor/commands/aqa-flow-data-collection.md # plugins/core-cursor-standalone/.cursor/commands/aqa-flow-requirements-clarification.md # plugins/core-cursor-standalone/.cursor/commands/aqa-flow-selector-identification.md # plugins/core-cursor-standalone/.cursor/commands/aqa-flow-test-correction.md # plugins/core-cursor-standalone/.cursor/commands/aqa-flow-test-implementation.md # plugins/core-cursor-standalone/.cursor/commands/aqa-flow-test-report-analysis.md # plugins/core-cursor-standalone/.cursor/commands/aqa-flow.md # plugins/core-cursor-standalone/.cursor/commands/testgen-flow-data-collection.md # plugins/core-cursor-standalone/.cursor/commands/testgen-flow-gap-and-contradiction-analysis.md # plugins/core-cursor-standalone/.cursor/commands/testgen-flow-project-config-loading.md # plugins/core-cursor-standalone/.cursor/commands/testgen-flow-question-generation.md # plugins/core-cursor-standalone/.cursor/commands/testgen-flow-requirements-document-generation.md # plugins/core-cursor-standalone/.cursor/commands/testgen-flow-test-case-export.md # plugins/core-cursor-standalone/.cursor/commands/testgen-flow-test-case-generation.md # plugins/core-cursor-standalone/.cursor/commands/testgen-flow.md # plugins/core-cursor-standalone/.cursor/rules/plugin-files-mode.mdc # plugins/core-cursor-standalone/.cursor/skills/debugging/SKILL.md # plugins/core-cursor-standalone/.cursor/skills/orchestrator-contract/SKILL.md # plugins/core-cursor-standalone/.cursor/skills/requirements-authoring/SKILL.md # plugins/core-cursor/commands/INDEX.md # plugins/core-cursor/commands/aqa-flow-data-collection.md # plugins/core-cursor/commands/aqa-flow-requirements-clarification.md # plugins/core-cursor/commands/aqa-flow-selector-identification.md # plugins/core-cursor/commands/aqa-flow-test-correction.md # plugins/core-cursor/commands/aqa-flow-test-implementation.md # plugins/core-cursor/commands/aqa-flow-test-report-analysis.md # plugins/core-cursor/commands/aqa-flow.md # plugins/core-cursor/commands/testgen-flow-data-collection.md # plugins/core-cursor/commands/testgen-flow-gap-and-contradiction-analysis.md # plugins/core-cursor/commands/testgen-flow-project-config-loading.md # plugins/core-cursor/commands/testgen-flow-question-generation.md # plugins/core-cursor/commands/testgen-flow-requirements-document-generation.md # plugins/core-cursor/commands/testgen-flow-test-case-export.md # plugins/core-cursor/commands/testgen-flow-test-case-generation.md # plugins/core-cursor/commands/testgen-flow.md # plugins/core-cursor/skills/debugging/SKILL.md # plugins/core-cursor/skills/orchestrator-contract/SKILL.md # plugins/core-cursor/skills/requirements-authoring/SKILL.md
|
'qa-flow' if it's API flow for user it may not be obvious that for API testing we need to trigger qa-flow and for UI we need to trigger aqa-flow. I think we need to change names to proper one. Please rename flows. |
|
Why r2 instructions changed? Do we need fix on prod? |
mkuznietsov
left a comment
There was a problem hiding this comment.
Please fix review comments
I'm not quite sure -- and my initial plan was not to touch r2 at all, but such configuration does not pass pre-commit hooks. |
# Conflicts: # docs/definitions/skills.md # instructions/r3/core/skills/testing/SKILL.md # instructions/r3/core/workflows/aqa-flow-code-analysis.md # instructions/r3/core/workflows/aqa-flow-data-collection.md # instructions/r3/core/workflows/aqa-flow-requirements-clarification.md # instructions/r3/core/workflows/aqa-flow-selector-identification.md # instructions/r3/core/workflows/aqa-flow-selector-implementation.md # instructions/r3/core/workflows/aqa-flow-test-correction.md # instructions/r3/core/workflows/aqa-flow-test-implementation.md # instructions/r3/core/workflows/aqa-flow-test-report-analysis.md # instructions/r3/core/workflows/aqa-flow.md
Follows the QA/AQA -> API-QA/UI-QA rename; the example prompt arrived via the main merge still referencing the old command name. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ce to .md file which was renamed to .xml
| Present-and-wait gate before an irreversible / high-stakes step (apply corrections, approve a spec/plan). The calling phase supplies: the **closed approval-token list**, the **re-present step**, and the **revisit target** on full reject. Approval vocabulary is governed by `hitl` — this is its QA-flow specialization; the phase's closed token list is authoritative for that phase. | ||
|
|
||
| 1. Present the artifact for review; **WAIT** for explicit approval. Do NOT assume approval; a message containing questions or suggestions is reviewing, not approving. | ||
| 2. **Approval = an exact closed token** (case-insensitive — `APPROVED` / `Approve` / `yes` all match the lowercase token). Anything else — `"looks good"`, `"ship it"`, `"LGTM"`, `"sounds good"`, `"go ahead"`, `"OK"`, `"go"`, a question, a suggestion, or silence — is **REVIEW, not approval**: re-prompt for an exact token. The list is **closed**; "or similar" / "etc." extension language in other loaded rules does NOT extend it. |
There was a problem hiding this comment.
I’m thinking these instructions could be part of the HITL skill. Why were they defined in a separate file?
There was a problem hiding this comment.
"Approval vocabulary is governed by hitl — this is its QA-flow specialization". It gets its own parameters, new for each phase; hitl does not accept parameters.
But what is true is that 1-2 were repeating hitl in many ways. I've tried to fix it.
|
|
||
| </resources> | ||
|
|
||
| </requirements-authoring> |
There was a problem hiding this comment.
This skill now is more than 500 rows. I assume that it's not recommended size. Maybe it's worth to make it smaller?
There was a problem hiding this comment.
Yes, shortening the file is not exactly the scope of this PR -- but the time has come. Please take a look at the result.
Rosetta Instruction Quality Review — Description HardeningConfirming @YevheniiaLementova's finding. The frontmatter
Current description: Concrete issues:
Comparison with existing skills:
Suggested fix (~13 tokens, call-to-action): Automated triage by Rosetta agent |
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
isolomatov-gd
left a comment
There was a problem hiding this comment.
Pls check comments, understand the failure mode and FIX that failure across the board. NOT only files provided
|
|
||
| <when_to_use_skill> | ||
| Use when encountering errors, test failures, unexpected behavior, or when a previous fix failed and the issue persists. Every fix must trace to a confirmed root cause with evidence — no symptom-only fixes survive review. | ||
| Use when encountering errors, test failures, unexpected behavior, or when a previous fix failed and the issue persists. For an automated-test execution report (UI or API), use the test-execution triage mode in `<test_execution_triage>`. Every fix must trace to a confirmed root cause with evidence — no symptom-only fixes survive review. |
There was a problem hiding this comment.
| Use when encountering errors, test failures, unexpected behavior, or when a previous fix failed and the issue persists. For an automated-test execution report (UI or API), use the test-execution triage mode in `<test_execution_triage>`. Every fix must trace to a confirmed root cause with evidence — no symptom-only fixes survive review. | |
| Use when encountering errors, test failures, unexpected behavior, or when a previous fix failed and the issue persists, or for triaging. Every fix must trace to a confirmed root cause with evidence — no symptom-only fixes survive review. |
| - ALWAYS find root cause before attempting fixes; symptom fixes are failure | ||
| - Make implicit become explicit — incorrect assumptions hide root causes | ||
| - Evidence label per cause — `Confirmed` (both sides cited) | `Assumption` (partial; state the missing evidence) | `Unknown` (none; state what is needed); the weaker label wins ties | ||
| - Redaction of captured logs, requests, responses, or page sources → USE SKILL `sensitive-data` (canonical authority) |
There was a problem hiding this comment.
| - Redaction of captured logs, requests, responses, or page sources → USE SKILL `sensitive-data` (canonical authority) | |
| - Redaction of captured logs, requests, responses, or page sources → USE SKILL `sensitive-data` |
There was a problem hiding this comment.
Remove noise that is not needed
|
|
||
| <test_execution_triage> | ||
|
|
||
| Read-only triage of an automated-test execution report. The caller supplies three bindings: the report path, the failure-category taxonomy to use, and the output-artifact contract. |
There was a problem hiding this comment.
| Read-only triage of an automated-test execution report. The caller supplies three bindings: the report path, the failure-category taxonomy to use, and the output-artifact contract. | |
| Read-only triage of an automated test execution report. |
There was a problem hiding this comment.
The rest belongs to the actual workflow!
|
|
||
| Read-only triage of an automated-test execution report. The caller supplies three bindings: the report path, the failure-category taxonomy to use, and the output-artifact contract. | ||
|
|
||
| 1. Parse the report — per-test status, error message, stack trace, duration, and captured artifacts (screenshots, page source, request/response). |
There was a problem hiding this comment.
| 1. Parse the report — per-test status, error message, stack trace, duration, and captured artifacts (screenshots, page source, request/response). | |
| 1. Analyze the report — per-test status, error message, stack trace, duration, and captured artifacts (screenshots, page source, request/response). |
|
|
||
| </implementation> | ||
|
|
||
| <test_execution_triage> |
There was a problem hiding this comment.
This part is tightly coupled with the workflow.
Workflow KNOWS how to and calls the skill, SKILL does NOT know workflow, SKILL is universal functionality, that workflow wants to reuse.
| @@ -0,0 +1,109 @@ | |||
| --- | |||
There was a problem hiding this comment.
Why do we need frontmatters here?
| @@ -0,0 +1,64 @@ | |||
| --- | |||
| name: qa-knowledge | |||
| description: "Rosetta — QA test-automation conventions: failure taxonomies, redaction scope, authoring & correction discipline, and the artifact skeletons each phase emits." | |||
There was a problem hiding this comment.
description does not follow schema
|
|
||
| <when_to_use_skill> | ||
|
|
||
| Activate inside any API-QA or UI-QA flow phase that authors, analyzes, or corrects tests and needs the QA-domain conventions general skills don't own — failure taxonomies, redaction scope, assertion/coverage discipline, selector & page-object rules, and the artifact skeletons each phase emits. This is the HOW layer; WHERE artifacts live is owned by `qa-structure`. |
There was a problem hiding this comment.
- Remove meta-reasoning why or when or what - skill only says what should be done and when and reasoning to make decisions - skill is not self describing except description field.
- Skill must not know what workflow is using or even if workflow was used
- Describing parameters and what is expected FROM CALLER inside of the skill (too late)
- Skill is universal, indepedent
- Workflow calls skills and provides what it should do
- Flow control, gates/etc belong to workflow
| <core_concepts> | ||
|
|
||
| - All Rosetta prep steps MUST be FULLY completed, load-context skill loaded and fully executed | ||
| - This skill carries only QA-specific conventions; generic collection, analysis, authoring, triage, and redaction mechanics are owned by the phase's other loaded skills and are not restated here. |
|
|
||
| <api-qa-project-config-template> | ||
|
|
||
| Written to the canonical path `agents/api-qa/api-qa-project-config.md` (project-wide; shared across every API-QA session for this project). Populate each section from the user's answers. **`qa-structure/references/config-schema.md` is the single authority for which keys are required and their accepted values / `N/A — <reason>` forms — not restated per-field below;** the required keys carry `[per config-schema]` placeholders. Mark optional fields `TBD — <reason>` when discovery is intentionally deferred. |
There was a problem hiding this comment.
Paths must follow rosetta folder structure. This staff belongs to FEATURE PLAN folder. Reason: conflicts of changes (imagine somebody runs it twice, in separate sessions, or commits the files)
📋 Prompt Quality Validation Report❌ Validation FailedThe full markdown report and raw JSON output are available in the workflow artifacts for 5 days. Files With Issues
📄
|
| Severity | Gate | Details |
|---|---|---|
| Medium | Bloat Control | Problem: The new <rosetta_canonical_lists> block (the five docs/definitions/*.md lists) was moved out of the on-demand reference pa-rosetta.md and into the always-loaded SKILL.md. Every prompt-authoring session, including non-Rosetta targets, now carries this Rosetta-only text even though it is guarded by 'skip for any other system'.Reason: Progressive disclosure keeps domain-specific detail in references so the base skill stays small; resident Rosetta-only content is minor bloat for non-Rosetta authoring. Solution: Keep the canonical-lists content in the on-demand pa-rosetta.md reference (already acquired for Rosetta prompts per core_concepts) rather than resident in SKILL.md, or reduce the inline block to a one-line pointer. |
📄 instructions/r3/core/skills/reverse-engineering/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| High | Single Responsibility | Problem: The new mode 'test-automation architecture analysis' inventories implementation-level assets — 'framework + language', 'page objects (what each represents, selectors, methods)', 'shared utilities' — which is HOW-level detail. That conflicts with the skill's core thesis (core-concept 1: 'Code tells you how; a spec captures what and why') and pushes the skill from spec-recovery toward test-asset discovery, a second responsibility. Reason: A skill built to filter out implementation detail should not also own an implementation-asset inventory; mixing them dilutes the single responsibility and can confuse the agent about whether to abstract or transcribe. Solution: Either scope this mode strictly to recovering test-suite intent/domain behavior (WHAT/WHY), or move the reusable-asset inventory into the testing or codemap skill and have reverse-engineering only supply the recovered intent. Keep the API-contract-extraction mode, which is genuinely contract recovery. |
📄 instructions/r3/core/skills/testing/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| Medium | Output Contract | Problem: The modes say 'The caller supplies the concrete inputs (... output target ...)' and the reference states 'The calling workflow PHASE owns the artifact ... output contract', yet the same modes hard-name exact record sections the caller's artifact must contain — 'the caller's ### Uncovered Assertions record' and 'the caller's ## Selector Management record'. Ownership of the section headings is ambiguous.Reason: If the caller's artifact uses different heading names, an agent following the skill can write to a section that does not exist, breaking the hand-off. Solution: State plainly that the skill defines the record schema (these heading names) while the caller supplies the file path/location, or make the heading names caller-parameterized. Pick one and say so. |
📄 instructions/r3/core/skills/discovery/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| High | Goal Specification | Problem: when_to_use promises a capability the skill body does not deliver. It says load this skill to 'scan the codebase, and assemble a normalized raw-context artifact', but data_collection states 'The single mode of this skill: collect from one or more vendor sources' and every step and binding covers only Jira/Confluence/TestRail. There is no codebase-scan procedure, and the frontmatter description omits codebase entirely (it names only Jira/Confluence/TestRail). Reason: An agent that loads discovery to scan a codebase finds no instructions and will either improvise an unspecified scan or stall, producing an inconsistent artifact for a use case the skill advertises but cannot fulfill. Solution: Either remove 'or scan the codebase' from when_to_use so the in-scope statement matches the vendor-only body and the description, or add an explicit codebase-scan mode/step (or a codebase binding) so the promised capability has a procedure. |
📄 instructions/r3/core/skills/qa-knowledge/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| Medium | Structural Coherence | Problem: when_to_use claims the skill owns 'assertion/coverage discipline, selector & page-object rules' as QA conventions, but the resources router lists no asset or reference that surfaces those two topics. An agent that activates the skill to get selector/page-object or assertion/coverage guidance has no ACQUIRE target to reach it. Reason: A skill that advertises a convention it cannot route to leaves the agent guessing where the rule lives, weakening the point-of-use progressive-disclosure contract. Solution: Either add a router row (asset or reference) that delivers assertion/coverage and selector/page-object conventions, or drop those two claims from when_to_use so the promised scope matches what the router actually routes to. |
📄 instructions/r3/core/skills/qa-knowledge/assets/test-spec-template.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| Medium | Success Criteria | Problem: This template has no explicit 'Done when' completion gate. Its sibling assets (gap-finding-templates, ui-qa-plan-template, ui-qa-test-impl-record) each end with a testable done-when line, but test-spec-template only lists per-section 'Required content'. An agent could emit a spec that still holds unfilled placeholders and treat it as complete. Reason: The other four templates in this skill define a checkable completion contract; the missing gate here weakens the emit contract and lets incomplete specs pass silently, hurting reliability. Solution: Add a short 'Done when' line, e.g. every ATC-NNN has a Test File Mapping row, no bracket placeholders remain, and each Assumption cites a source. Mirror the completion-gate phrasing the sibling assets use. |
📄 instructions/r3/core/skills/qa-knowledge/references/redaction-scope.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| High | Self-Validation | Problem: The pre-emit re-scan grep list does not cover several sensitive-value locations the redaction targets themselves name. Target 1 lists secrets appearing in X-Api-Key and Cookie examples, but the grep list has no X-Api-Key: or Cookie: pattern (only api_key= query style). Target 2 requires redacting generic credentialed URLs (https://user:pass@host) and signed-URL signatures (?sig=), but the grep list covers only postgresql:// and mongodb+srv:// credentialed forms and has no sig= pattern. Target 3 requires redacting service-account JSONs and certificates, but the grep list has BEGIN PRIVATE KEY / BEGIN RSA PRIVATE KEY only (no BEGIN CERTIFICATE and no service-account JSON marker). Reason: This reference is safety-critical: the grep list is the automated pre-emit backstop against leaking secrets into a PUBLIC-by-default tracked artifact. Because it is narrower than the redaction targets it claims to enforce, a secret in an X-Api-Key/Cookie header, a generic credentialed URL, a signed URL, a certificate, or a service-account JSON can pass the re-scan gate undetected. Solution: Extend the re-scan grep list so it enforces every declared target: add X-Api-Key:, Cookie:, a generic user:pass@ credentialed-URL pattern (not only the two DB schemes), sig=, BEGIN CERTIFICATE, and a service-account JSON marker (e.g. "private_key" / "type": "service_account"). Keep the target list and the grep list in one-to-one correspondence so the automated backstop matches the stated scope. |
📄 instructions/r3/core/skills/requirements-authoring/references/authoring-catalogs.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| Medium | Reference Integrity | Problem: The intro says 'The general authoring catalogs (unit template, EARS, schema fields, ID/filesystem/refactoring conventions) live inline in SKILL.md'. After this refactor the unit template, EARS, schema fields, and ID conventions no longer live inline in SKILL.md — they were extracted to the sibling reference requirement-catalogs.md (and the fill-in template lives in asset ra-requirement-unit.xml). Only filesystem and refactoring conventions still sit inline in SKILL.md. Reason: A synthesis agent following this note will look in SKILL.md for catalogs that were moved, wasting a step or missing the content; the pointer must match where the content actually landed after the extraction. Solution: Fix the pointer to say the unit template, EARS, schema fields, and ID conventions live in requirement-catalogs.md (template in asset ra-requirement-unit.xml), and that filesystem/refactoring conventions stay inline in SKILL.md. |
📄 instructions/r3/core/skills/requirements-use/SKILL.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| High | Single Responsibility | Problem: The added <gap_analysis> mode broadens a skill scoped to 'consume approved requirements for planning, implementation, and validation' into multi-source QA analysis. Its variants examine 'Jira, Confluence, TestRail, API spec, test cases, a test plan', including a 'Test-cases-vs-API-spec variant' (cross-reference test steps against API analysis) and a 'Test-plan variant' (evaluate five UI-QA test-plan completeness dimensions). Cross-referencing test steps to endpoint contracts and grading a UI-QA test plan are testing/QA responsibilities, not requirement consumption, giving the skill a distinct second job.Reason: One skill carrying two responsibilities enlarges its cognitive search space and weakens SRP; a QA test-plan analyzer bolted onto a requirements-use skill will be selected and loaded for the wrong tasks. Solution: Either narrow the mode to requirement-source contradiction/gap detection only, or move the test-cases-vs-API-spec and UI-QA test-plan variants into a QA-scoped skill and have requirements-use reference it. Keep requirements-use focused on approved-requirement usage. |
📄 instructions/r3/core/workflows/api-qa-flow.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| Medium | Precision & Explicitness | Problem: The Phase-output gate mislabels the source phase for traceability. It says 'every ATC-NNN in test-specs.md traces to a Phase 3 source (a raw-data.md test case and/or an analysis.md finding)'. But raw-data.md is the Phase 1 artifact, not Phase 3; only analysis.md is Phase 3. An agent verifying ATC traceability may look for test-case sources under the wrong phase. Reason: Correct phase attribution matters here because the whole gate is a cross-phase traceability check; the current wording contradicts the phase numbering the same file defines. Solution: Reword to 'traces to a Phase 1 raw-data.md test case and/or a Phase 3 analysis.md G[N]/C[N]/A[N] finding' so each cited artifact is tied to its correct producing phase. |
📄 instructions/r3/core/workflows/api-qa-flow-api-spec-analysis.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| Medium | Cognitive Budget | Problem: Step 2.2 item 1 packs the entire skill invocation into one long run-on sentence that binds six things at once: target-endpoint list, spec source, output shape, redaction scope, validation reference, and output path, plus a preceding ACQUIRE-first clause and a trailing sensitive-data clause. This single directive is the load-bearing hand-off for the phase yet is hard to parse and easy to partially execute. Reason: Agents reliably follow decomposed directives; a dense multi-binding sentence on the critical hand-off raises the chance an input (e.g. redaction scope) is dropped. Solution: Break step 2.2 item 1 into a short numbered list of bindings (one line per input: endpoints, spec source, output-shape asset, redaction reference, validation, output path), keeping the ACQUIRE-first and sensitive-data steps as separate ordered items. |
📄 instructions/r3/core/workflows/api-qa-flow-test-case-specification.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| Medium | Structural Coherence | Problem: failure_handling says the repeated-rejection cap applies 'after the 3rd cycle of reject-and-re-present per <present_for_approval> step 3', but step 3 is only the 'DO NOT PROCEED without an exact approval token' rule. The actual reject-and-re-present loop is run by the approval-gate asset applied in step 2, so the internal cross-reference points at the wrong sub-step. Reason: A wrong internal step reference can send an agent to the wrong block when deciding whether the retry cap has been hit, weakening the escalation path. Solution: Point the reject-cycle reference at present_for_approval step 2 (the approval-gate application), or renumber so the loop and the cap reference the same step. |
📄 instructions/r3/core/workflows/api-qa-flow-test-correction.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| Medium | Conflict Resolution | Problem: The retry-cap outcome is described two ways with the same 'retry-cap' term. correction_contract says 'on retry-cap, loop back to Phase 6', but apply_changes step 4 says after 3 failed apply cycles the agent must 'stop, record Phase 7 blocked ... and escalate to the user' (not loop to Phase 6). update_state step 4 adds a separate 'if tests still fail: return to Phase 6' loop. An agent reading only the contract could auto-loop to Phase 6 on the apply cap instead of stopping and escalating. Reason: Overloaded 'retry-cap' wording risks the agent choosing the wrong branch (silent re-loop vs required user escalation) after repeated apply failures. Solution: Disambiguate the two caps: name the in-phase apply retry cap (escalate to user) separately from the retest-failure loop (return to Phase 6), and make correction_contract's 'on retry-cap' line say which cap loops where. |
📄 instructions/r3/core/workflows/ui-qa-flow-code-analysis.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| Medium | Structural Coherence | Problem: The parent workflow lists this phase's inputs as 'user request + CONTEXT.md + ARCHITECTURE.md + IMPLEMENTATION.md + test plan file', treating the three repo docs as primary. This phase's Input Contract table instead treats 'project_description.md (repo root)' as the primary Project description row and lists CONTEXT/ARCHITECTURE/IMPLEMENTATION as 'Optional repo docs ... read when present'. The two changed files disagree on which document is the authoritative primary source of standards. Reason: A reader following the parent may look for CONTEXT/ARCHITECTURE first while the phase file weights project_description.md first; the ambiguity is low-risk because the gate accepts either, but it can cause inconsistent standards extraction across runs. Solution: Align the two documents: either add project_description.md to the parent's Phase 3 input line, or make the phase table's primary row match the parent (repo docs), so an agent has one consistent notion of the primary standards source. The Input GATE ('project description OR one authoritative repo doc') already tolerates either, so this is a coherence fix, not a behavior break. |
📄 instructions/r3/core/workflows/ui-qa-flow-data-collection.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| Medium | Self-Validation | Problem: redaction-scope.md defines a pre-emit grep re-scan as a distinct second pass, and qa-knowledge marks running it as mandatory, but on the collection path the only redaction actor is discovery's single sensitive-data scan (SKILL step 4). No phase or skill step explicitly instructs running the grep re-scan as a separate verification before writing to the public-by-default plan file. Reason: The primary sensitive-data scan is fail-closed so a secret is still likely caught (defense-in-depth weakening, not fail-open), but a token missed by the single model-driven scan would reach the public plan file with no grep backstop. Solution: In discovery SKILL step 4 or the phase's redaction note, add an explicit action: 're-scan the emitted section with the redaction-scope grep list before write', so the second pass is a named step, not merely referenced as scope. |
| Medium | Structural Coherence | Problem: The <zero_doc_protocol> block is defined physically inside <gather_confluence step="1.3">, but it is invoked earlier in <gather_testrail step="1.2"> ('If unresolvable with scope active, apply <zero_doc_protocol>') and in <workflow_context>. A phase reader hitting step 1.2 meets the reference before its definition, which appears in a later, differently-scoped step.Reason: The tag resolves within the same file so behavior is safe, but the forward reference into a nested sibling step slightly hurts readability and locality for an agent processing steps in order. Solution: Hoist <zero_doc_protocol> to a top-level block in the phase (e.g. next to <workflow_context> or <phase_steps>) so both step 1.2 and step 1.3 reference a peer-level definition rather than one nested inside a sibling step. |
📄 instructions/r3/core/workflows/ui-qa-flow-selector-implementation.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| Medium | Failure Handling | Problem: This phase only handles one failure path — the missing Part A inventory ( <part_a_inventory_gate>). Its three sibling phases (test-implementation, test-correction, test-report-analysis) each carry a dedicated failure block covering a missing/empty agents/ui-qa-state.md and an unresolvable <test-name> slug. This phase has neither, and step 5.2 only says 'Check linting/format ... fix errors' with no path for lint errors that cannot be auto-fixed.Reason: Every input/output path in the phase depends on the state file and resolved slug; without an explicit stop the agent may guess a slug or write to the wrong page object, and unfixable lint has no defined exit. Solution: Add a short failure-handling block mirroring the sibling phases: stop and record if agents/ui-qa-state.md is missing or the <test-name> slug is unresolvable (do not guess), and define what to do when linting cannot be auto-fixed (surface to user, do not silently accept). |
📄 instructions/r3/core/workflows/testgen-flow-data-collection.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| Medium | Reference Integrity | Problem: The phase names another skill's private reference files while describing what it does internally: discovery "loads references/jira-binding.md" and "loads references/confluence-binding.md, which owns URL parsing, direct-URL-vs-search precedence, child-page traversal...". This exposes and semi-deep-links discovery's internal structure rather than treating it as an opaque skill.Reason: Confirmed relocation is correct (hardcoded MCP function names like mcp_Jira_MCP_jira_get_issue, CQL building, ranking, and child-page traversal moved into the discovery jira/confluence bindings — a real agent-agnostic improvement), but the phase should not narrate the callee skill's private file layout.Solution: Refer to the collection behavior by the vendor binding contract (issue/documentation binding) and drop the named internal file paths and internal-mechanism enumeration; let discovery own and hide those internals. |
📄 instructions/r3/core/workflows/testgen-flow-project-config-loading.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| Medium | Reference Integrity | Problem: The new redaction-at-intake gate instructs ACQUIRE qa-knowledge/references/redaction-scope.md FROM KB, deep-linking a private reference inside another skill's folder rather than going through that skill's public interface. This crosses skill-folder isolation (a callee skill's internal references may move/rename).Reason: The redaction gate is a strong new safety boundary (fail-closed, PUBLIC-by-default treatment, minimal always-present grep floor for Bearer/JWT/private-key/user:pass@), and the deep-link risk is partly mitigated by that floor — but the direct private-reference ACQUIRE is still a skill-isolation smell worth fixing. Solution: Reach the redaction scope through the skill's public surface (e.g. USE SKILL qa-knowledge / sensitive-data with a redaction-scope request) or relocate the shared grep list to a location not owned by a single skill's private references. Keep the existing fail-closed + minimal-grep floor as the fallback. |
📄 instructions/r3/core/workflows/testgen-flow-question-generation.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| Medium | Decision Branching | Problem: The base file defined how to assign each question a priority ('P0 = blocks implementation', 'P1 = significant quality impact', 'P2 = affects approach', 'P3 = minor clarification', grouped by risk from Phase 2). The new step 3.2 only says 'Group questions by priority: P0 (Critical, MUST answer), P1 (High), P2 (Medium), P3 (Low)' with no criteria for what makes a question P0 vs P1. This assignment logic was removed and not relocated. Reason: Without explicit priority criteria, two runs can group the same questions differently, and the P0-gate enforcement in validate_answers depends on questions being correctly marked P0 in the first place. Solution: Restore a one-line assignment rule per level (or state that a question inherits the priority/risk of its source issue in analysis.md), so priority grouping is repeatable rather than guessed. |
📄 instructions/r3/core/workflows/testgen-flow-requirements-document-generation.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| Medium | Bloat Control | Problem: The added 'Conscious tradeoff — why no inline per-entry fallback (declared once, not re-derived per turn)' block in failure_handling is roughly seven lines of design rationale (three justification bullets 'The skill is a hard dependency by design', 'Deployment guarantee', 'Section contract is phase-owned', plus two paragraphs comparing to the sibling test-case-generation file). The operational instruction it supports ('the phase blocks when the skill is unavailable; do NOT fabricate a partial requirements.md') is already stated in the bullet directly above it. Reason: pa-hardening requires removing non-operational rationale and explanatory meta-notes; the justification is redundant with the preceding instruction and enlarges the phase without changing agent behavior. Solution: Delete the rationale block; keep only the operational bullet that the phase blocks and must not fabricate a fallback when requirements-authoring is unavailable. Move any design justification to a change-log, not the target prompt. |
📄 instructions/r3/core/workflows/testgen-flow-test-case-export.md
⚠️ Issues Found
| Severity | Gate | Details |
|---|---|---|
| High | Conflict Resolution | Problem: The destructive TestRail-write confirmation is defined twice with different tokens: phase step 6.4b unblocks on 'yes'/'proceed', while the resolved scenarios-generation/references/testrail-export.md step 7 uses a different gate ('a' export all / 'b' export non-duplicates / 'c' cancel, default cancel). The two are not reconciled. Reason: Both gates fail safe, but the duplication risks a confusing double-prompt or the agent treating a 'yes' at 6.4b as satisfying the binding gate and skipping the binding's dedup + sensitive re-scan before an irreversible add_case write. Solution: Make one layer own the user-facing confirmation and have the other defer to it: either have phase step 6.4b delegate the confirm + dedup pre-scan entirely to the binding's a/b/c gate, or have the binding assume the phase already confirmed and only run the sensitive-value re-scan. Align the token sets. |
QA, AQA, Testgen workflows tested and bugfixed/hardenen in several places.
QA and AQA are now structurally unified.
Some common parts of the 3 workflows, along with the skills created previously by Maksym, are merged into 4 pre-existing skills and 2 new skills foreseen by
docs/definitions/skills.md.Two unforeseen skills are created and added to the canonical list:
qa-knowledgeandqa-structure.Instructions on how to test the workflows are also added:
docs/manual-tests.The idea of 'canonical lists' is now explicitly mentioned in
coding-agents-prompt-authoringskill (previously it was only mentioned in a reference file).