diff --git a/mcp_server/services/skills/specflow-planning/SKILL.md b/mcp_server/services/skills/specflow-planning/SKILL.md index a6c655d..3ea4d0a 100644 --- a/mcp_server/services/skills/specflow-planning/SKILL.md +++ b/mcp_server/services/skills/specflow-planning/SKILL.md @@ -19,16 +19,6 @@ When `run_generation` is later called, it MUST be invoked with the same `spec_di This runs entirely locally — no backend, no SpecFlow generation session. Re-runnable as the plan evolves. -## Execution constraints (local IDE — no backend tier enforcement) - -Backend planning ran on Opus with a large turn budget. Locally you must compensate: -1. Produce **10–20+ small phases** for non-trivial projects — prefer more phases over fewer large ones. We plan to fit in 120k context window. -2. Copy **all** locked dimensions from analysis into the plan's first section — no summarizing away Part D micro-locks. -3. For brownfield: read `<>/analysis/repo_summary.md` and `<>/`; phases must **extend** existing code, not rebuild from scratch. -4. Self-check before finishing: every feature in the spec has at least one phase; no phase exceeds the hard limits (2–3 tasks, 8–10 files). - -Inspect `<>/` for context (`<>/analysis/specification_index.md`, `<>/analysis/specification_completeness.md`, etc.). - ## Project Knowledge Base (if available) - Check `.claude/agents/` for specialized agent definitions with project-specific guidance - Check `<>/` for project context (CONTEXT.md, ARCHITECTURE.md, CODEMAP.md) @@ -37,11 +27,11 @@ Inspect `<>/` for context (`<>/analysis/specification_ Note: full Knowledge Base initialization happens later on the backend, inside the generation phase — you don't need it to plan. -Your task is to create a detailed implementation plan that breaks the work into phases. Each phase should be: +Break the work into phases. Each phase should be: - Self-contained (can run independently) - Testable (has associated unit tests) - Committable (produces meaningful commits) -- Focused on a logical unit of work +- Focused on a single logical unit of work ## Workflow @@ -55,7 +45,7 @@ Your task is to create a detailed implementation plan that breaks the work into ### 2. Create Implementation Plan - Write the plan to **EXACTLY** `<>/planning/IMPLEMENTATION_PLAN.md`. The backend contract validator expects this exact filename — any other name (e.g. `plan.md`, `implementation.md`) will cause `run_generation` to be rejected with `PLAN_MISSING`. -- If the file would exceed ~300 lines, write it in parts and merge with `cat`. +- **No length limit — completeness beats brevity.** Each phase is executed by a *fresh agent* that sees only this plan and the code committed so far; it never sees the reasoning of earlier phases. Give every phase enough pinned detail (contracts, deliverable files, dependencies, acceptance criteria) to execute without guessing. Write the file in parts and merge with `cat` if it grows large. **MANDATORY FIRST SECTION: "Architectural Decisions - Locked Values"** @@ -100,7 +90,69 @@ These values are LOCKED from <>/analysis/specification_completeness [List all features with their required sub-tasks] ``` -### Part F: Integration Environment — Locked Values +**SECOND SECTION: "Design Patterns & Architecture"** + +Right after the locked values, add a short section naming the architectural style and the cross-cutting +design patterns the whole solution will follow, so every phase agent applies them consistently: + +```markdown +## Design Patterns & Architecture + +- **Architectural style**: [e.g., layered / hexagonal / clean architecture; how modules depend on each other] +- **Cross-cutting patterns**: [the patterns used repeatedly across phases and why — e.g., Repository for + persistence, Dependency Injection for wiring, Factory for object construction, Strategy for pluggable + behaviour, Adapter for external/mocked integrations, Observer/pub-sub for events, State machine for lifecycle] +- **Modelling rules**: classes / dataclasses / Pydantic models and Enums over raw dicts and strings; + SRP and Open/Closed so changes are additive; patterns must enforce correctness at compile time and in unit tests +``` + +Individual phases must reference these patterns (see the per-phase **Design patterns** field below) rather +than re-deciding architecture. Choose patterns that fit the locked technology stack — do not force a pattern +where a plain function or module is clearer. + +**THIRD SECTION: "Shared Contracts"** — the single most important section for a multi-phase autonomous run. + +Because each phase agent runs with fresh context, any interface shared across phases MUST be pinned here so +later phases *import* it instead of re-inventing it. Unpinned contracts are the #1 cause of code that doesn't +compose (the API phase and the frontend phase invent different shapes and only collide at integration). +Pin every cross-phase seam you can determine from the specs and locked values: + +```markdown +## Shared Contracts + +- **Data model / DB schema**: entities, fields, types, relationships, keys — the schema every phase reads/writes +- **API surface**: each endpoint's method, path, request shape, response shape, status codes, error body +- **Shared types / DTOs**: named types used by more than one phase (with the module/file they live in) +- **Error taxonomy**: the canonical error types/codes and how they surface (exception classes, HTTP mapping) +- **Events / messages** (if any): event names and payload shapes +- **Config / env contract**: env var names, config keys, and their meaning +``` + +Rules: +- Derive contracts from the locked values and specs — do not invent requirements. If the spec leaves a + contract undetermined, decide it here **once** (see Assumptions below) so all phases agree. +- **Front-load a foundational phase** (Phase 1 or 2) whose deliverable is exactly these contracts as code + (schema/migrations, type definitions, API stubs/OpenAPI, error classes, config). Every downstream phase + lists that phase as a dependency and imports from it — no phase redefines a shared type. + +**FOURTH SECTION: "Assumptions & Resolved Ambiguities"** + +There is no human in the loop for the 6–8 hour run, so every ambiguity you resolve during planning must be +recorded here once — otherwise each phase agent resolves it differently and the app becomes internally +inconsistent. Keep it short and only for things the specs left genuinely open. + +```markdown +## Assumptions & Resolved Ambiguities + +| # | Ambiguity in the spec | Decision (what all phases must assume) | Basis | +|---|-----------------------|----------------------------------------|-------| +| 1 | [what was unclear] | [the single resolution] | [spec ref / locked value / convention] | +``` + +Do not use this section to override locked values or invent scope — only to fix under-specified details. + +**FIFTH SECTION (only if Part F = INTEGRATION_TESTS_READY): "Part F: Integration Environment — Locked Values"** + (Only if `<>/analysis/specification_completeness.md` Part F says INTEGRATION_TESTS_READY) Copy all integration details from Part F's locked values table into `<>/planning/IMPLEMENTATION_PLAN.md` @@ -127,7 +179,7 @@ Planning guidance for INTEGRATION_TESTS_READY: ### CRITICAL: PHASE SIZING — SMALL, FOCUSED PHASES -- **Prefer more phases over bigger phases.** Typical projects need 10-20+ phases. +- **Prefer more phases over bigger phases.** Typical projects need 10-20+ phases — plan to fit each in a ~120k context window. - Each phase MUST focus on a **single component and a single concern** (e.g., "Backend: auth endpoints" not "Backend + Frontend auth"). - **Hard limits per phase:** - Maximum 2-3 tasks per phase @@ -154,11 +206,27 @@ Planning guidance for INTEGRATION_TESTS_READY: - Each phase should have: - A clear name and description - List of tasks to be completed (2-3 max) - - Dependencies on previous phases (if any) + - **Deliverable files**: the explicit list of files this phase creates or modifies (paths). No two phases + may create the same file. This makes the 8–10 file limit checkable and, for brownfield, makes "extend, + don't rebuild" verifiable. + - **Dependencies**: the specific earlier phases this phase requires and *what* it consumes from each + (e.g. `Depends on Phase 2 — imports the DB schema; Phase 5 — imports the API client`). "None" if + foundational. Never depend on a phase that runs later. + - **Contracts**: which entries from the "Shared Contracts" section this phase produces or consumes. A + phase implementing an interface others use must produce exactly the pinned contract; a consumer imports + it — it does not redefine it. + - **Acceptance criteria (Definition of Done)**: concrete, verifiable exit conditions — which unit tests + pass, what observable behavior works, and an explicit scope fence (what this phase deliberately does + NOT do). This is how the autonomous agent knows when to stop; without it, it under- or over-builds. + - **Design patterns**: name which patterns from the "Design Patterns & Architecture" section this + phase applies, and where. Don't force a pattern where a plain function is clearer. - **ENFORCEMENT CHECKPOINT**: List which Part D conventions apply to this phase - Phases should follow a logical progression: - Phase 1: Project setup, dependencies, basic structure (following Part A and Part D conventions) - - Phase 2-N: Core functionality implementation (following Parts B, C, D conventions) + - Foundational phase (early): emit the **Shared Contracts** as code — schema/migrations, shared types, + API stubs/OpenAPI, error classes, config — so every later phase imports them + - Phase 2-N: Core functionality implementation (following Parts B, C, D conventions), each importing the + pinned contracts rather than redefining them - Final Phase: Testing, documentation, deployment configuration (following Part A and Part E) ### Per-phase optional agent MCPs (markdown annotation) @@ -181,21 +249,21 @@ Example phase heading: ### Example Phase Breakdown for a task management app with 4 features (dashboard, projects, tasks, settings): - Phase 1: Project Setup - Initialize both backend and frontend projects, dependencies, config -- Phase 2: Backend Data Models & Schema - Database schema, ORM models, migrations -- Phase 3: Backend API & Services - Endpoints, business logic, auth middleware -- Phase 4: Backend Error Handling & Tests - Error handling, validation, unit tests -- Phase 5: Frontend Shell & Shared Components - Layout, routing, design system, API client -- Phase 6: Frontend: Dashboard Feature - Dashboard page, widgets, data visualization -- Phase 7: Frontend: Projects Feature - Project list, create/edit, project detail page -- Phase 8: Frontend: Tasks Feature - Task board, task CRUD, drag-and-drop, filters -- Phase 9: Frontend: Settings Feature - User settings, preferences, profile management -- Phase 10: Frontend: Auth Flow & Error Handling - Login/signup pages, protected routes, error boundaries -- Phase 11: Frontend Tests - Component and integration tests across features -- Phase 12: Infrastructure & Deployment - Docker, docker-compose, environment config -- Phase 13: Integration Testing & Polish - E2E tests, final verification, documentation - -Notice: backend is 3 phases (layer-split), frontend is 7 phases (feature-split). This is intentional — -frontend features are heavier and each one involves components, state, styling, and API integration. +- Phase 2: Shared Contracts Foundation - Emit the **Shared Contracts** as code: DB schema/migrations, shared + DTOs/types, API stubs (OpenAPI), error taxonomy, config contract. Every later phase depends on and imports + from this phase; no later phase redefines a shared type. +- Phase 3: Backend Data Models & Persistence - ORM models and repositories implementing the Phase 2 schema +- Phase 4: Backend API & Services - Endpoints, business logic, auth middleware (implements the Phase 2 API surface) +- Phase 5: Backend Error Handling & Tests - Error handling, validation, unit tests +- Phase 6: Frontend Shell & Shared Components - Layout, routing, design system, API client (typed from Phase 2 contracts) +- Phase 7: Frontend: Dashboard Feature - Dashboard page, widgets, data visualization +- Phase 8: Frontend: Projects Feature - Project list, create/edit, project detail page +- Phase 9: Frontend: Tasks Feature - Task board, task CRUD, drag-and-drop, filters +- Phase 10: Frontend: Settings Feature - User settings, preferences, profile management +- Phase 11: Frontend: Auth Flow & Error Handling - Login/signup pages, protected routes, error boundaries +- Phase 12: Frontend Tests - Component and integration tests across features +- Phase 13: Infrastructure & Deployment - Docker, docker-compose, environment config +- Phase 14: Integration Testing & Polish - E2E tests, final verification, documentation ### Important - Do NOT start implementation — only create the plan @@ -203,7 +271,29 @@ frontend features are heavier and each one involves components, state, styling, - Each phase should represent 1-3 days of focused work - Create ONLY markdown files (`IMPLEMENTATION_PLAN.md` and optionally `e2e-test-plan.md`) +### 3. Review the plan with a subagent + +After the plan file(s) are written, spawn a **fresh subagent** to review them with clean context — do not +review your own work inline. Give the subagent the plan file path(s), `<>/`, and +`<>/analysis/specification_completeness.md`, and ask it to check for: + +- **Missing scope** — every feature, requirement, and locked dimension in the specs and + `specification_completeness.md` (Parts A–F) is covered by at least one phase; nothing silently dropped. +- **Incorrect statements** — no claim in the plan contradicts the specs, the locked values, or the + existing `<>/` code (for brownfield); no phase re-specifies what already exists; no invented + requirements or unsupported assumptions. +- **Structural issues** — phases respect the hard sizing limits, dependencies are ordered correctly, and + the named design patterns fit the locked technology stack. +- **Contract & composition integrity** — every contract a phase consumes is produced by an earlier phase + and matches the "Shared Contracts" section (no shape drift); no two phases create the same deliverable + file; the dependency graph is acyclic with no forward references; every phase has concrete acceptance + criteria; every resolved ambiguity is recorded once in the Assumptions section. + +The subagent returns a list of concrete gaps and corrections. Apply the fixes to the plan file(s), then +re-run the review if the changes were substantial. Only report the plan as done once the review is clean. + When done, state: - Path of plan file(s) written - Phase count +- Summary of what the review subagent found and how it was resolved - Next step: run `run_generation` to start parallel codegen agents (2–8 hours). diff --git a/mcp_server/tests/test_specflow_local_tools.py b/mcp_server/tests/test_specflow_local_tools.py index b0b0a8a..a019097 100644 --- a/mcp_server/tests/test_specflow_local_tools.py +++ b/mcp_server/tests/test_specflow_local_tools.py @@ -134,7 +134,6 @@ def test_planning_skill_recommends_high_tier_models(self) -> None: assert "recommended-models:" in content assert "claude-opus-4.6" in content - assert "Execution constraints" in content @pytest.mark.parametrize("skill_name", ["specflow-analysis", "specflow-planning"]) def test_argument_hint_includes_src_dir(self, skill_name: str) -> None: