theagenticguy · theagenticguy · Jul 4, 2026 · Jul 3, 2026 · Jul 3, 2026 · Jul 3, 2026
@@ -11,7 +11,7 @@ graph_hash: <from list_repos>
 
 ## Repo profile                      # from project_profile
 - languages: TypeScript 87%, Rust 11%, Python 2%
-- stacks: Node 22, pnpm 10, DuckDB, Vitest
+- stacks: Node 24, pnpm 10, SQLite (node:sqlite), Vitest
 - entry points: packages/mcp/src/index.ts, packages/cli/src/bin.ts
 
 ## Top communities (≤ 10)            # from sql: SELECT name, inferred_label, cohesion, symbol_count
@@ -80,17 +80,20 @@ File-level fan-out means one role may seed multiple packets (for example, `doc-a
 
 ## Schema preflight (non-optional)
 
-**Before composing any SQL query over `nodes`, `relations`, or any other
-graph table, Phase 0 MUST probe the schema once and cache the result in
-`.prefetch.md`.** Subagents then consult the cached schema instead of
-guessing column names, which would fail with `Binder Error: Referenced
-column "X" not found in FROM clause`.
+**Before composing any SQL query over `nodes`, `edges`, or any other
+table in `store.sqlite`, Phase 0 MUST probe the schema once and cache the
+result in `.prefetch.md`.** Subagents then consult the cached schema
+instead of guessing column names, which would fail with a `no such column`
+SQLite error.
 
-The probe is one SQL call:
+The probe is one SQL call over SQLite's schema catalog:
 
 ```
-sql("SELECT table_name, column_name FROM information_schema.columns
-     WHERE table_name IN ('nodes','relations') ORDER BY table_name, column_name")
+sql("SELECT m.name AS table_name, c.name AS column_name
+     FROM sqlite_master m
+     JOIN pragma_table_info(m.name) c
+     WHERE m.type = 'table' AND m.name IN ('nodes','edges')
+     ORDER BY table_name, column_name")
 ```
 
 Write the result as a dedicated `.context.md § Schema` subsection (top 30
@@ -100,8 +103,8 @@ rows, no cap) and as a digest line in `.prefetch.md` with
 Historical note: `nodes` does not have a `path` column — routes store their
 endpoint under `name` (as `"METHOD /path"`), and the file path is
 `file_path`. Observed during a 2026-04-27 dogfood when subagent prompts
-blindly referenced `path` and hit a Binder Error on an otherwise fresh
-graph. The preflight prevents this class of bug across every subagent.
+blindly referenced `path` and hit a `no such column` error on an otherwise
+fresh index. The preflight prevents this class of bug across every subagent.
 
 ## Phase 0 algorithm (pseudocode)
 
@@ -111,7 +114,7 @@ Steps marked `# wave 0a` and `# wave 0b` each run as a single parallel tool-use
 # wave 0a — independent precompute (one parallel batch)
 1.  staleness = list_repos → entry for this repo → _meta.codehub/staleness
 2.  profile = project_profile({repo})
-3.  schema = sql("SELECT table_name, column_name FROM information_schema.columns …")
+3.  schema = sql("SELECT … FROM sqlite_master JOIN pragma_table_info(name) …")
 4.  routes = route_map({repo})
 5.  tools = tool_map({repo})
 6.  deps = dependencies({repo})
@@ -126,7 +129,7 @@ Steps marked `# wave 0a` and `# wave 0b` each run as a single parallel tool-use
 # wave 0b — depends on schema + profile (one parallel batch)
 11. communities = sql("SELECT … FROM nodes WHERE kind='Community' …")
 12. processes   = sql("SELECT … FROM nodes WHERE kind='Process' …")
-13. relations   = sql("SELECT … FROM relations …")   # for diagrams
+13. relations   = sql("SELECT … FROM edges …")   # for diagrams
 14. top_folders = top-5 folders by file count (from profile.entryPoints + glob)
 15. owners_summary = [owners({path}) for path in top_folders]
 16. if --group: group_hits = group_query({group, canonical_terms})

@@ -23,7 +23,7 @@ Cites `packages/foo/src/index.ts` (200 LOC) style file references.
 | Layer | Technology | Source |
 |---|---|---|
 | Runtime | Node 22 | `package.json:7` |
-| Storage | DuckDB + hnsw_acorn | `packages/storage/src/index.ts:12` |
+| Storage | SQLite (single-file, node:sqlite) — FTS5 + vector KNN | `packages/storage/src/index.ts:12` |
 | ... | ... | ... |
 
 ## Module map

@@ -14,11 +14,11 @@ flowchart LR
   core[Core types]
   ingestion[Ingestion DAG]
   storage[Storage]
-  duckdb[(DuckDB)]:::external
+  sqlite[(store.sqlite)]:::external
   mcp --> core
   ingestion --> core
   ingestion --> storage
-  storage --> duckdb
+  storage --> sqlite
   classDef external stroke-dasharray: 3 3
 ```
 
@@ -104,14 +104,14 @@ For `architecture/data-flow.md`.
 flowchart TB
   source[Repo files]
   parse[tree-sitter parser]
-  graph[DuckDB graph]
+  store[(store.sqlite)]
   embed[ONNX embedder]
   query[MCP query]
   source --> parse
-  parse --> graph
+  parse --> store
   parse --> embed
-  embed --> graph
-  query --> graph
+  embed --> store
+  query --> store
 ```
 
 **Rules:**

@@ -26,7 +26,7 @@ Produce `{{ docs_root }}/diagrams/architecture/components.md`: a single Mermaid
 | Shared context | `Read {{ context_path }}` | always first |
 | Prefetch ledger | `Read {{ prefetch_path }}` | always first |
 | Top communities | `{{ context_path }} § Top communities` | cached |
-| Community relations | `{{ prefetch_path }} § sql relations` or `mcp__codehub__sql({query: "SELECT source, target, kind FROM relations WHERE kind IN ('CONTAINS','CALLS','IMPORTS') LIMIT 500"})` | cached if digest present; mid-run otherwise |
+| Community relations | `{{ prefetch_path }} § sql relations` or `mcp__codehub__sql({query: "SELECT src, dst, type FROM edges WHERE type IN ('CONTAINS','CALLS','IMPORTS') LIMIT 500"})` | cached if digest present; mid-run otherwise |
 | Component method list | `mcp__codehub__context({symbol: <community-name>})` per top 8 | mid-run |
 
 ## 4. Process

@@ -26,7 +26,7 @@ Produce `{{ docs_root }}/diagrams/structural/dependency-graph.md`: a single Merm
 | Shared context | `Read {{ context_path }}` | always first |
 | Prefetch ledger | `Read {{ prefetch_path }}` | always first |
 | Top communities | `{{ context_path }} § Top communities` | cached |
-| Internal edges | `{{ prefetch_path }} § sql relations` or `mcp__codehub__sql({query: "SELECT source, target, kind FROM relations WHERE kind IN ('CONTAINS','CALLS','IMPORTS') LIMIT 500"})` | cached if digest present; mid-run otherwise |
+| Internal edges | `{{ prefetch_path }} § sql relations` or `mcp__codehub__sql({query: "SELECT src, dst, type FROM edges WHERE type IN ('CONTAINS','CALLS','IMPORTS') LIMIT 500"})` | cached if digest present; mid-run otherwise |
 | External dependencies | `{{ context_path }} § Stack` or `mcp__codehub__dependencies({repo: "{{ repo }}"})` | cached if digest present; mid-run otherwise |
 
 ## 4. Process

@@ -52,7 +52,7 @@ Produces a single ONBOARDING.md with a ranked reading order drawn from graph cen
 | Layer | Tech | Source |
 |---|---|---|
 | Runtime | Node 22 | `package.json:7` |
-| Storage | DuckDB | `packages/storage/src/index.ts:12` |
+| Storage | SQLite (single-file, node:sqlite) | `packages/storage/src/index.ts:12` |
 | ... | ... | ... |
 
 ## Read these 10 files first (in order)

@@ -86,20 +86,20 @@ Two-hop upstream trace for every caller of `validatePayment`:
 
 ```sql
 WITH direct AS (
-  SELECT from_id, to_id, 1 AS depth
-  FROM relations
+  SELECT src, dst, 1 AS depth
+  FROM edges
   WHERE type = 'CALLS'
-    AND to_id IN (SELECT id FROM nodes WHERE name = 'validatePayment' AND kind = 'Function')
+    AND dst IN (SELECT id FROM nodes WHERE name = 'validatePayment' AND kind = 'Function')
 ),
 indirect AS (
-  SELECT r.from_id, d.to_id, 2 AS depth
-  FROM relations r
-  JOIN direct d ON d.from_id = r.to_id
+  SELECT r.src, d.dst, 2 AS depth
+  FROM edges r
+  JOIN direct d ON d.src = r.dst
   WHERE r.type = 'CALLS'
 )
 SELECT caller.name, caller.file_path, caller.start_line, u.depth
 FROM (SELECT * FROM direct UNION ALL SELECT * FROM indirect) u
-JOIN nodes caller ON caller.id = u.from_id
+JOIN nodes caller ON caller.id = u.src
 ORDER BY u.depth ASC, caller.name;
 ```
 

@@ -75,9 +75,9 @@ When a name is ambiguous, `context` returns a ranked candidate list instead of s
 
 ```sql
 SELECT r.step, callee.name, callee.file_path, callee.start_line
-FROM relations r
-JOIN nodes proc   ON proc.id = r.from_id
-JOIN nodes callee ON callee.id = r.to_id
+FROM edges r
+JOIN nodes proc   ON proc.id = r.src
+JOIN nodes callee ON callee.id = r.dst
 WHERE r.type = 'PROCESS_STEP'
   AND proc.kind = 'Process'
   AND proc.name = 'CheckoutFlow'

@@ -5,7 +5,7 @@ description: "Use when the user asks about OpenCodeHub itself — available MCP
 
 # OpenCodeHub Guide
 
-Quick reference for every OpenCodeHub MCP tool, MCP resource, and the graph + temporal store schema.
+Quick reference for every OpenCodeHub MCP tool, MCP resource, and the single-file `store.sqlite` schema.
 
 ## Always Start Here
 
@@ -59,7 +59,7 @@ standalone artifact producer with its own preconditions and output path.
 | `mcp__codehub__context`           | 360-degree symbol view + `confidenceBreakdown` + `cochanges` side-section |
 | `mcp__codehub__impact`            | Blast radius with risk tier + `confidenceBreakdown`                       |
 | `mcp__codehub__detect_changes`    | Map an uncommitted or committed diff to affected symbols and flows        |
-| `mcp__codehub__sql`               | Read-only query: `sql` arg → temporal DuckDB (cochanges/summaries); `cypher` arg → lbug graph (5 s timeout) |
+| `mcp__codehub__sql`               | Read-only SQL over the single-file `store.sqlite` (all tables: nodes, edges, embeddings, cochanges, symbol_summaries, store_meta; 5 s timeout). `cypher` arg is reserved for community-fork adapters (unsupported by the default backend) |
 | `mcp__codehub__signature`         | Symbol declaration + stubbed members (class/interface header + method/property signatures, bodies elided) |
 
 ### HTTP / RPC surface
@@ -115,91 +115,135 @@ Lightweight reads for navigation (every URI uses the `codehub://` scheme):
 | `codehub://repo/{name}/context`                | Stats + staleness envelope                  |
 | `codehub://repo/{name}/schema`                 | Live node kinds / relation types for `sql`  |
 
-> Cluster and process navigation resources (`codehub://repo/{name}/clusters`, `codehub://repo/{name}/processes`, etc.) are slated for a later wave. Until then, use the typed tools or Cypher (below) filtered to `kind = 'Community'` / `kind = 'Process'`.
-
-## Where the graph lives (ADR 0016)
-
-There are **two stores**, and they are queried differently:
-
-- **Graph tier — `graph.lbug`** (ladybug, Cypher dialect). Holds nodes, edges,
-  and embeddings. Query it via the typed tools (`query` / `context` / `impact` /
-  `route_map` / …) or, for bespoke questions, **Cypher** via the MCP `sql`
-  tool's `cypher` argument. There is NO `nodes` or `relations` SQL table.
-- **Temporal tier — `temporal.duckdb`** (DuckDB SQL). Holds only the
-  `cochanges` and `symbol_summaries` tables. The `sql` argument of the MCP
-  `sql` tool (and `codehub sql` on the CLI) targets THIS store.
-
-Pass exactly one of `sql` (temporal DuckDB) or `cypher` (lbug graph) to the MCP
-`sql` tool.
-
-### Graph schema (lbug / Cypher)
-
-One node label `CodeNode` carrying `kind` as a **property** (NOT a per-kind
-label). One relationship table per relation type. Properties are **snake_case**
-(`file_path`, `start_line`, `inferred_label`, `step_count`, `entry_point_id`);
-a camelCase RETURN alias comes back as the alias you give it, but the stored
-property names are snake_case.
-
-**Node kinds** (`n.kind` values): File, Folder, Function, Class, Method,
+> Cluster and process navigation resources (`codehub://repo/{name}/clusters`, `codehub://repo/{name}/processes`, etc.) are slated for a later wave. Until then, use the typed tools or a `sql` query (below) filtered to `kind = 'Community'` / `kind = 'Process'`.
+
+## Where the index lives (ADR 0019)
+
+There is **one store**: a single-file `<repo>/.codehub/store.sqlite`
+(WAL, via Node's built-in `node:sqlite`). ADR 0019 supersedes ADR 0016:
+the old two-tier backend (a `graph.lbug` Ladybug graph plus a
+`temporal.duckdb` DuckDB file) is gone. One `SqliteStore` class implements
+both the graph and temporal surfaces over that single file.
+
+Everything is directly SQL-queryable through the MCP `sql` tool's `sql`
+argument (and `codehub sql` on the CLI):
+
+- **Graph tables (`nodes` and `edges`).** `nodes` holds the typed base
+  columns plus a `payload` JSON overflow; `edges` is one polymorphic table
+  keyed by `(src, dst, type, step)`. Query them via the typed tools
+  (`query` / `context` / `impact` / `route_map` / …) or, for bespoke
+  questions, plain SQL. Multi-hop traversal is a recursive SQL CTE over
+  `edges`, NOT Cypher.
+- **Embeddings (the `embeddings` table).** Vectors live in a BLOB column;
+  there is NO Parquet sidecar (it was dropped with DuckDB).
+- **Temporal tables (`cochanges` and `symbol_summaries`).** Same file, no
+  second engine.
+- **`store_meta`.** Index metadata (graph hash, timestamps).
+
+Full-text search is BM25 via a SQLite FTS5 virtual table (`nodes_fts`).
+The `cypher` argument to the MCP `sql` tool is **reserved for community-fork
+graph adapters** (AGE / Memgraph / Neo4j / Neptune) and is **NOT supported
+by the default SQLite backend**, so pass `sql` for every query against the
+default store.
+
+### Graph schema (`nodes` / `edges` tables)
+
+The `nodes` table carries typed base columns (`id`, `kind`, `name`,
+`file_path`, `start_line`, `end_line`) plus a `payload` JSON column holding
+every kind-specific field. Reach payload fields with SQLite JSON1:
+`payload->>'$.inferredLabel'`, `payload->>'$.stepCount'`,
+`payload->>'$.entryPointId'`, `payload->>'$.cohesion'`,
+`payload->>'$.symbolCount'`.
+
+The `edges` table is polymorphic: `src`, `dst`, `type`, `confidence`,
+`step`, `reason`. The relation kind lives in the `type` column (there is no
+per-type table).
+
+**Node kinds** (`kind` values): File, Folder, Function, Class, Method,
 Interface, Constructor, Struct, Enum, Macro, Typedef, Union, Namespace, Trait,
 Impl, TypeAlias, Const, Static, Variable, Property, Record, Delegate,
 Annotation, Template, Module, CodeElement, Community, Process, Route, Tool,
 Finding, Dependency, Contributor, Repo, ProjectProfile, Section.
 
-**Relationship types** (each is its own edge label): CONTAINS, DEFINES, IMPORTS,
+**Relationship types** (`edges.type` values): CONTAINS, DEFINES, IMPORTS,
 CALLS, EXTENDS, IMPLEMENTS, HAS_METHOD, HAS_PROPERTY, ACCESSES, METHOD_OVERRIDES,
 OVERRIDES, METHOD_IMPLEMENTS, MEMBER_OF, PROCESS_STEP, HANDLES_ROUTE, FETCHES,
 HANDLES_TOOL, ENTRY_POINT_OF, WRAPS, QUERIES, REFERENCES, FOUND_IN, DEPENDS_ON,
 OWNED_BY.
 
-Cochanges live only in the **temporal** `cochanges` table (DuckDB SQL), never as
-graph edges.
+Cochanges live only in the `cochanges` table, never as graph edges.
 
-## Cypher cheat-sheet (MCP `sql` tool, `cypher` arg)
+## SQL cheat-sheet (MCP `sql` tool, `sql` arg)
 
-All inbound callers of a function by name:
+All inbound callers of a function by name (join `edges` to `nodes` on both
+endpoints):
 
-```cypher
-MATCH (caller:CodeNode)-[r:CALLS]->(callee:CodeNode)
+```sql
+SELECT caller.name AS name, caller.file_path AS file,
+       caller.start_line AS line, e.confidence AS confidence,
+       e.reason AS reason
+FROM edges e
+JOIN nodes caller ON caller.id = e.src
+JOIN nodes callee ON callee.id = e.dst
 WHERE callee.name = 'validateUser' AND callee.kind = 'Function'
-RETURN caller.name AS name, caller.file_path AS file, caller.start_line AS line,
-       r.confidence AS confidence, r.reason AS reason
-ORDER BY r.confidence DESC
-LIMIT 50
+  AND e.type = 'CALLS'
+ORDER BY e.confidence DESC
+LIMIT 50;
 ```
 
-Top communities by cohesion:
+Top communities by cohesion (kind-specific fields via JSON1):
 
-```cypher
-MATCH (n:CodeNode)
-WHERE n.kind = 'Community'
-RETURN n.name AS name, n.inferred_label AS label, n.cohesion AS cohesion,
-       n.symbol_count AS symbols
-ORDER BY n.cohesion DESC
-LIMIT 20
+```sql
+SELECT name,
+       payload->>'$.inferredLabel' AS label,
+       payload->>'$.cohesion' AS cohesion,
+       payload->>'$.symbolCount' AS symbols
+FROM nodes
+WHERE kind = 'Community'
+ORDER BY cohesion DESC
+LIMIT 20;
 ```
 
 Process entry points:
 
-```cypher
-MATCH (n:CodeNode)
-WHERE n.kind = 'Process'
-RETURN n.name AS name, n.inferred_label AS label, n.step_count AS steps,
-       n.entry_point_id AS entry_point
-ORDER BY n.step_count DESC
+```sql
+SELECT name,
+       payload->>'$.inferredLabel' AS label,
+       payload->>'$.stepCount' AS steps,
+       payload->>'$.entryPointId' AS entry_point
+FROM nodes
+WHERE kind = 'Process'
+ORDER BY steps DESC;
 ```
 
 SCIP-confirmed CALLS edges only (strict impact):
 
-```cypher
-MATCH ()-[r:CALLS]->()
-WHERE r.confidence >= 0.95 AND r.reason STARTS WITH 'scip:'
-RETURN r
+```sql
+SELECT * FROM edges
+WHERE type = 'CALLS'
+  AND confidence >= 0.95
+  AND reason LIKE 'scip:%';
+```
+
+Multi-hop blast radius is a recursive CTE over `edges`. The typed `impact`
+tool wraps this, so prefer it unless you need a bespoke traversal:
+
+```sql
+WITH RECURSIVE reach(id, depth) AS (
+  SELECT id, 0 FROM nodes WHERE name = 'validateUser'
+  UNION
+  SELECT e.src, r.depth + 1
+  FROM edges e JOIN reach r ON e.dst = r.id
+  WHERE e.type IN ('CALLS', 'REFERENCES') AND r.depth < 3
+)
+SELECT DISTINCT n.name, n.file_path, MIN(r.depth) AS depth
+FROM reach r JOIN nodes n ON n.id = r.id
+GROUP BY n.id ORDER BY depth;
 ```
 
-### Temporal SQL cheat-sheet (MCP `sql` tool, `sql` arg)
+### Co-change cheat-sheet (MCP `sql` tool, `sql` arg)
 
-Tightest co-change pairs (DuckDB SQL — temporal store):
+Tightest co-change pairs (`cochanges` table):
 
 ```sql
 SELECT source_file, target_file, lift, cocommit_count