Skip to content

Expand test coverage for the play recall pipeline #20

@genezhang

Description

@genezhang

Context

PR #19 added the play recall pipeline (recordPlay, resolveProblem, buildEditChanges, recallPlays with similarity gating + per-session memo) and shipped focused regression tests for the three bugs Copilot flagged (#7 JOIN aliasing, #3 NULL-distance fail-closed, #2 subject-vs-content fast-path).

The remaining surface area is uncovered. This issue tracks expanding tests against the live module, not as a blocker for #19.

Gaps worth covering

resolveProblem fallback ladder — the function tries (1) first user turn's first text part, (2) session.title row. Tests should cover:

  • both paths return the user text when present
  • title fallback fires when no user turn exists yet (early-projection race)
  • empty string return when neither source has anything ≥ 20 chars
  • malformed JSON in part.data doesn't crash (safeParse swallows it)

buildEditChanges matrix — the JOIN-aliasing test pins the SQL but not the data shapes. Should cover:

  • tool=edit with oldString/newString → captured as EditChange
  • tool=write with content → captured (oldString empty)
  • tool=multiedit → captured
  • non-edit tools (read/grep/bash) skipped
  • error/pending state skipped (only completed/success)
  • missing filePath skipped
  • EDIT_CAPTURE_MAX = 3 cap respected on a runaway edit loop
  • string-encoded data (Zeta's JSONB-as-string return shape) decoded same as object form

recallPlays query-text + cache behavior — currently uncovered:

  • BENCH_PREAMBLE strip: text containing \n---\n separator embeds only the tail; text without the separator embeds in full
  • per-session memo: same excludeSessionId returns cached array on second call (verify SQL only fires once)
  • FIFO eviction at PLAYS_CACHE_MAX = 64: 65th distinct session evicts the oldest
  • excludeSessionId properly filters self-recall via source_session != $4
  • limit clamping to [1, 10]
  • ZENGRAM_PLAY_MAX_DISTANCE env override is parsed (numeric, > 0) and falls back to 0.5 for non-numeric / non-positive

recordPlay write-skip and embedding refresh — beyond the subject/content fast-path:

  • short-problem early return (problem.length < 20)
  • empty-content early return (no modified files + no edits)
  • INSERT path on first write of a session
  • UPDATE path via ON CONFLICT (id) DO UPDATE on subsequent writes
  • embedding refresh fires after the upsert (fire-and-forget, but observable via execute-call dispatch)

renderPlayContent rendering — pure function, easy wins:

  • modified files list when workspace_file has rows
  • fallback to deriving file list from edits when workspace_file is empty
  • (deleted) tag for deleted files
  • truncateForPlay cuts collapse-whitespace + 600-char preview
  • sanitizePath escapes &, <, > so play content can't break out of XML tags

toRepoRelative cross-platform — path normalization:

  • forward-slash input + Linux-style rootDir → relative
  • backslash input (Windows-style) → forward-slash relative
  • file outside rootDir → preserves original absolute path (no ../../ breadcrumbs)
  • no rootDir → returns input unchanged

Out of scope

  • Pre-existing recallWorkspaceContext test failures (7 on main) — separate issue.
  • Integration tests against a real Zeta DB — the existing tests mock the client; that's the established pattern in test/knowledge/index.test.ts.

Why it matters

The bench numbers in PR #19 came from this pipeline working as a system. Each piece (preamble strip, distance gate, memo, edit capture) was a real bug fix; without test coverage a future refactor can silently regress any one of them and we'd only notice on the next bench pass.

Linked: PR #19, review comment #9.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions