Skip to content

Epic 28: loopctl as the agent knowledge platform — move stable knowledge primitives + ingestion into the API (after Epic 27) #179

Description

@mkreyman

Thesis

Turn loopctl from "a store the Python skills talk to" into the platform that does the knowledge work and serves LLM agents directly. Today the harvesting/retrieval logic lives in ~6 Python skills (a venv each, loopctl_client.py copied 6×, symlinked dirs, env-key juggling, cross-machine drift — a whole claude-config-sync skill exists only to fight that drift). Consolidating the stable knowledge primitives into loopctl (Elixir, tested, deployed once, reachable by any agent/machine/user via API+MCP) removes that mess and makes the value a sellable product (cf. the engine's own ContextForge + RAGBench proposals).

Sequence: AFTER Epic 27 (knowledge scale hardening, #175/#176/#177). Productizing primitives that aren't yet scale-correct just bakes the bugs in. Harden first, then expose the hardened primitives as the platform API.

The cut (what moves where) — the load-bearing decision

Bucket What Goes to
Primitives ingest→chunk→synthesize→dedup→embed→link→store; retrieval; quality/prune; dedup; graph; pairs/novelty/walk loopctl (Elixir API)
Local bridge local file-tree discovery (Dropbox/Synology/gdrive/~/workspace), OCR (tesseract), Calibre convert, portfolio/gh gathering thin claude-config skill (can't disappear — loopctl can't reach the user's disk)
Orchestration the idea/creativity engine; (future) content-production agent skill over the primitives — keep prompts/model/tilt out of the deploy cycle

Keystone deliverable: a server-side ingestion endpoint

POST /api/v1/knowledge/ingest (Oban job + status): accepts {text | file | url, source_type, format, project_id, extra_tags} and runs the full pipeline server-side — extract (for server-reachable formats) → chunk → synthesize (configurable LLM; port the careful prompts) → dedup via idempotency tag → embed → link → store. This single endpoint absorbs the synthesis + publish half of book/document/code/web/youtube-knowledge-extract. Format extraction that needs local resources (OCR of local scans, Calibre) stays in the bridge, which uploads already-extracted text.

Inventory (current scripts → target)

  • extract_book/docs/repo/web/youtube .py synthesis+publish → ingest endpoint. Their local extraction (OCR/Calibre/file-walk) → bridge skill.
  • prune_kb.py (LLM-judged junk removal) → curation endpoint / Oban job (server-side quality).
  • cleanup_frontmatter.py, dedup, idempotency → fold into ingest + curation.
  • backup.py → already covered once Epic 27 Export hard-caps at 5,000 articles (413) — can't back up a real KB; stream the bundle instead of materializing it #176 (streamed export) lands.
  • idea-synthesizer (creativity engine) + portfolio.pystays an agent skill over loopctl primitives.
  • second-brain-orchestrator (monitor/harvest_all) → thins to a watcher that feeds the ingest API.

Companion: retrieval-quality measurement (the "dogfood" win)

Build the RAGBench dogfood loop as part of the platform: continuously score loopctl's own retrieval quality (chunking/ranking/recall) on its own corpus, surfaced as a health metric. This is the measurement layer Epic 27 Theme 1 (observability) implies, and it's the proof-of-concept that sells the platform.

Honest caveats

  • Move the stable logic; keep experimenting in scripts. OCR/idempotency/chunking are stable → port. The creativity tuning and any new extractor still change weekly → keep as fast-iterating skills until they settle. Don't trade Python's iteration speed for Elixir's deploy ceremony prematurely.
  • The local bridge never fully disappears — but it shrinks from ~600-line harvesters to a ~50-line uploader.
  • LLM cost/model/keys move server-side for the ingest synthesis — needs per-tenant config, rate limits, cost attribution.

Out of scope (separate, later)

  • Loop-execution architecture refactor (gen_statem/supervision/telemetry) — its own design-first epic, lower priority (loop enforcement matters less as agents improve).
  • Content-production pipeline (corpus → drafted posts/newsletters) — an agent skill, not a loopctl endpoint, per the cut above.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestepicMulti-PR architectural track

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions