Skip to content

feat(knowledge): novelty-gated write-back (agents' KB #1)#222

Merged
mkreyman merged 1 commit into
masterfrom
knowledge-novelty-gate
Jun 30, 2026
Merged

feat(knowledge): novelty-gated write-back (agents' KB #1)#222
mkreyman merged 1 commit into
masterfrom
knowledge-novelty-gate

Conversation

@mkreyman

Copy link
Copy Markdown
Owner

What — agents' KB goodie #1: novelty-gated write-back

Turns the KB from a blind write sink into a curated one. When an agent proposes an article (POST /api/v1/articles), it's gated against the published corpus instead of publishing whatever it sends. The KB does only the mechanical part — embed → cosine → route — and the merge decision stays with the consuming agent, which is a step smarter than the KB.

Pipeline (default ON; force: true bypasses)

ProposalGate.assess embeds "{title}\n\n{body}", runs VectorSearch.nearest, and classifies the top similarity:

top cosine verdict action
0.97 duplicate create nothing200 pointing at the canonical article
0.88 low_novelty create as a draft, stamp metadata.proposal_novelty (score + nearest ids) for a reviewer/consumer to merge
< 0.88 novel create on the requested path

The response carries gate: {verdict, similarity, nearest} so the agent can act in-session (read/update the canonical instead of duplicating).

Design

  • Behaviour + config DI (ProposalAssessorBehaviour / MockProposalAssessor) → propose_article/3 is unit-testable, the assessor is swappable.
  • Resilient — ANY embedding failure (API down, power/internet outage) or a system-scoped (nil-tenant) proposal falls open (:unknown) → the gate never blocks a write.
  • Non-destructive — flags duplicates, never edits/deletes; a vanished canonical neighbor falls through to create.
  • Reuses existing machinerystatus: :draft + list_drafts review queue, the embedding client, VectorSearch (raw-vector path, no row needed). No new status.
  • Thresholds config-tunable (:knowledge_proposal_{duplicate,overlap}_threshold).

Tests (+19)

Pure classify bands; real-pgvector assess (duplicate / low / novel / fall-open / system-scope); propose_article routing incl. canonical-vanished + tenant isolation; controller verdict→HTTP rendering + force bypass. Default stub is :novel so all 44 existing create tests stay green.

Full gate green locally: format, credo --strict, dialyzer, 3037 tests, 0 failures.

Follow-up (next PR)

MCP knowledge_create tool description/params (surface gate.verdict + force) and reconcile debrief.md (its "always creates drafts" note predates this; the server now enforces the gate). The gate is already live for MCP clients since it's enforced server-side at the API.

Turn the KB from a blind write sink into a curated one: when an agent proposes an
article, gate it against the published corpus instead of publishing whatever it
sends. The KB does only the MECHANICAL part (embed → cosine → route); the merge
decision stays with the consuming agent, which is a step smarter than the KB.

Pipeline (POST /api/v1/articles, default ON; force:true bypasses):
  ProposalGate.assess → embed '{title}\n\n{body}', VectorSearch.nearest, classify:
    >= 0.97 duplicate  → create nothing, 200 + point at the canonical article
    >= 0.88 overlap    → create as a DRAFT, stamp metadata.proposal_novelty
                         (score + nearest ids) for a reviewer/consumer to merge
    <  0.88 novel      → create on the requested path
  Response carries  so the agent can act in-session.

Design choices:
- Behaviour + config DI (ProposalAssessorBehaviour / MockProposalAssessor), so
  propose_article/3 is unit-testable and the assessor is swappable.
- Resilient: ANY embedding failure (API down, power/internet outage) or a
  system-scoped (nil-tenant) proposal falls OPEN (:unknown) → never blocks a write.
- Non-destructive: flags duplicates, never edits/deletes; a vanished canonical
  neighbor falls through to create.
- Reuses existing machinery — status :draft + list_drafts review queue, the
  embedding client, VectorSearch (raw-vector path, no row needed). No new status.
- Thresholds config-tunable (:knowledge_proposal_{duplicate,overlap}_threshold).

Tests (+19): pure classify bands; real-pgvector assess (duplicate/low/novel/
fall-open/system-scope); propose_article routing incl. canonical-vanished + tenant
isolation; controller verdict→HTTP rendering + force bypass. Default stub is :novel
so all existing create tests stay green. Full gate green (3037 tests, dialyzer, credo).
@mkreyman mkreyman merged commit 1897617 into master Jun 30, 2026
9 checks passed
@mkreyman mkreyman deleted the knowledge-novelty-gate branch June 30, 2026 21:25
mkreyman added a commit that referenced this pull request Jun 30, 2026
…s (v2.25.0) (#223)

The novelty-gated write-back (#222) is enforced server-side, so MCP clients already
get gated behavior. This makes the agent AWARE of it: knowledge_create's description
now explains the gate.verdict outcomes (duplicate → nothing created, read data.id;
gated_to_draft → created as draft with metadata.proposal_novelty; created → novel),
and adds a force:true param to bypass the gate. +2 tests for the force passthrough.

Bumps to 2.25.0 (mcp-autopublish publishes on package.json version change).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant