Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 16 additions & 13 deletions .claude/skills/codehub-document/references/data-source-map.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ graph_hash: <from list_repos>

## Repo profile # from project_profile
- languages: TypeScript 87%, Rust 11%, Python 2%
- stacks: Node 22, pnpm 10, DuckDB, Vitest
- stacks: Node 24, pnpm 10, SQLite (node:sqlite), Vitest
- entry points: packages/mcp/src/index.ts, packages/cli/src/bin.ts

## Top communities (≤ 10) # from sql: SELECT name, inferred_label, cohesion, symbol_count
Expand Down Expand Up @@ -80,17 +80,20 @@ File-level fan-out means one role may seed multiple packets (for example, `doc-a

## Schema preflight (non-optional)

**Before composing any SQL query over `nodes`, `relations`, or any other
graph table, Phase 0 MUST probe the schema once and cache the result in
`.prefetch.md`.** Subagents then consult the cached schema instead of
guessing column names, which would fail with `Binder Error: Referenced
column "X" not found in FROM clause`.
**Before composing any SQL query over `nodes`, `edges`, or any other
table in `store.sqlite`, Phase 0 MUST probe the schema once and cache the
result in `.prefetch.md`.** Subagents then consult the cached schema
instead of guessing column names, which would fail with a `no such column`
SQLite error.

The probe is one SQL call:
The probe is one SQL call over SQLite's schema catalog:

```
sql("SELECT table_name, column_name FROM information_schema.columns
WHERE table_name IN ('nodes','relations') ORDER BY table_name, column_name")
sql("SELECT m.name AS table_name, c.name AS column_name
FROM sqlite_master m
JOIN pragma_table_info(m.name) c
WHERE m.type = 'table' AND m.name IN ('nodes','edges')
ORDER BY table_name, column_name")
```

Write the result as a dedicated `.context.md § Schema` subsection (top 30
Expand All @@ -100,8 +103,8 @@ rows, no cap) and as a digest line in `.prefetch.md` with
Historical note: `nodes` does not have a `path` column — routes store their
endpoint under `name` (as `"METHOD /path"`), and the file path is
`file_path`. Observed during a 2026-04-27 dogfood when subagent prompts
blindly referenced `path` and hit a Binder Error on an otherwise fresh
graph. The preflight prevents this class of bug across every subagent.
blindly referenced `path` and hit a `no such column` error on an otherwise
fresh index. The preflight prevents this class of bug across every subagent.

## Phase 0 algorithm (pseudocode)

Expand All @@ -111,7 +114,7 @@ Steps marked `# wave 0a` and `# wave 0b` each run as a single parallel tool-use
# wave 0a — independent precompute (one parallel batch)
1. staleness = list_repos → entry for this repo → _meta.codehub/staleness
2. profile = project_profile({repo})
3. schema = sql("SELECT table_name, column_name FROM information_schema.columns …")
3. schema = sql("SELECT FROM sqlite_master JOIN pragma_table_info(name) …")
4. routes = route_map({repo})
5. tools = tool_map({repo})
6. deps = dependencies({repo})
Expand All @@ -126,7 +129,7 @@ Steps marked `# wave 0a` and `# wave 0b` each run as a single parallel tool-use
# wave 0b — depends on schema + profile (one parallel batch)
11. communities = sql("SELECT … FROM nodes WHERE kind='Community' …")
12. processes = sql("SELECT … FROM nodes WHERE kind='Process' …")
13. relations = sql("SELECT … FROM relations …") # for diagrams
13. relations = sql("SELECT … FROM edges …") # for diagrams
14. top_folders = top-5 folders by file count (from profile.entryPoints + glob)
15. owners_summary = [owners({path}) for path in top_folders]
16. if --group: group_hits = group_query({group, canonical_terms})
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ Cites `packages/foo/src/index.ts` (200 LOC) style file references.
| Layer | Technology | Source |
|---|---|---|
| Runtime | Node 22 | `package.json:7` |
| Storage | DuckDB + hnsw_acorn | `packages/storage/src/index.ts:12` |
| Storage | SQLite (single-file, node:sqlite) — FTS5 + vector KNN | `packages/storage/src/index.ts:12` |
| ... | ... | ... |

## Module map
Expand Down
12 changes: 6 additions & 6 deletions .claude/skills/codehub-document/references/mermaid-patterns.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,11 @@ flowchart LR
core[Core types]
ingestion[Ingestion DAG]
storage[Storage]
duckdb[(DuckDB)]:::external
sqlite[(store.sqlite)]:::external
mcp --> core
ingestion --> core
ingestion --> storage
storage --> duckdb
storage --> sqlite
classDef external stroke-dasharray: 3 3
```

Expand Down Expand Up @@ -104,14 +104,14 @@ For `architecture/data-flow.md`.
flowchart TB
source[Repo files]
parse[tree-sitter parser]
graph[DuckDB graph]
store[(store.sqlite)]
embed[ONNX embedder]
query[MCP query]
source --> parse
parse --> graph
parse --> store
parse --> embed
embed --> graph
query --> graph
embed --> store
query --> store
```

**Rules:**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ Produce `{{ docs_root }}/diagrams/architecture/components.md`: a single Mermaid
| Shared context | `Read {{ context_path }}` | always first |
| Prefetch ledger | `Read {{ prefetch_path }}` | always first |
| Top communities | `{{ context_path }} § Top communities` | cached |
| Community relations | `{{ prefetch_path }} § sql relations` or `mcp__codehub__sql({query: "SELECT source, target, kind FROM relations WHERE kind IN ('CONTAINS','CALLS','IMPORTS') LIMIT 500"})` | cached if digest present; mid-run otherwise |
| Community relations | `{{ prefetch_path }} § sql relations` or `mcp__codehub__sql({query: "SELECT src, dst, type FROM edges WHERE type IN ('CONTAINS','CALLS','IMPORTS') LIMIT 500"})` | cached if digest present; mid-run otherwise |
| Component method list | `mcp__codehub__context({symbol: <community-name>})` per top 8 | mid-run |

## 4. Process
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ Produce `{{ docs_root }}/diagrams/structural/dependency-graph.md`: a single Merm
| Shared context | `Read {{ context_path }}` | always first |
| Prefetch ledger | `Read {{ prefetch_path }}` | always first |
| Top communities | `{{ context_path }} § Top communities` | cached |
| Internal edges | `{{ prefetch_path }} § sql relations` or `mcp__codehub__sql({query: "SELECT source, target, kind FROM relations WHERE kind IN ('CONTAINS','CALLS','IMPORTS') LIMIT 500"})` | cached if digest present; mid-run otherwise |
| Internal edges | `{{ prefetch_path }} § sql relations` or `mcp__codehub__sql({query: "SELECT src, dst, type FROM edges WHERE type IN ('CONTAINS','CALLS','IMPORTS') LIMIT 500"})` | cached if digest present; mid-run otherwise |
| External dependencies | `{{ context_path }} § Stack` or `mcp__codehub__dependencies({repo: "{{ repo }}"})` | cached if digest present; mid-run otherwise |

## 4. Process
Expand Down
2 changes: 1 addition & 1 deletion .claude/skills/codehub-onboarding/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ Produces a single ONBOARDING.md with a ranked reading order drawn from graph cen
| Layer | Tech | Source |
|---|---|---|
| Runtime | Node 22 | `package.json:7` |
| Storage | DuckDB | `packages/storage/src/index.ts:12` |
| Storage | SQLite (single-file, node:sqlite) | `packages/storage/src/index.ts:12` |
| ... | ... | ... |

## Read these 10 files first (in order)
Expand Down
14 changes: 7 additions & 7 deletions .claude/skills/opencodehub-debugging/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,20 +86,20 @@ Two-hop upstream trace for every caller of `validatePayment`:

```sql
WITH direct AS (
SELECT from_id, to_id, 1 AS depth
FROM relations
SELECT src, dst, 1 AS depth
FROM edges
WHERE type = 'CALLS'
AND to_id IN (SELECT id FROM nodes WHERE name = 'validatePayment' AND kind = 'Function')
AND dst IN (SELECT id FROM nodes WHERE name = 'validatePayment' AND kind = 'Function')
),
indirect AS (
SELECT r.from_id, d.to_id, 2 AS depth
FROM relations r
JOIN direct d ON d.from_id = r.to_id
SELECT r.src, d.dst, 2 AS depth
FROM edges r
JOIN direct d ON d.src = r.dst
WHERE r.type = 'CALLS'
)
SELECT caller.name, caller.file_path, caller.start_line, u.depth
FROM (SELECT * FROM direct UNION ALL SELECT * FROM indirect) u
JOIN nodes caller ON caller.id = u.from_id
JOIN nodes caller ON caller.id = u.src
ORDER BY u.depth ASC, caller.name;
```

Expand Down
6 changes: 3 additions & 3 deletions .claude/skills/opencodehub-exploring/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,9 +75,9 @@ When a name is ambiguous, `context` returns a ranked candidate list instead of s

```sql
SELECT r.step, callee.name, callee.file_path, callee.start_line
FROM relations r
JOIN nodes proc ON proc.id = r.from_id
JOIN nodes callee ON callee.id = r.to_id
FROM edges r
JOIN nodes proc ON proc.id = r.src
JOIN nodes callee ON callee.id = r.dst
WHERE r.type = 'PROCESS_STEP'
AND proc.kind = 'Process'
AND proc.name = 'CheckoutFlow'
Expand Down
162 changes: 103 additions & 59 deletions .claude/skills/opencodehub-guide/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ description: "Use when the user asks about OpenCodeHub itself — available MCP

# OpenCodeHub Guide

Quick reference for every OpenCodeHub MCP tool, MCP resource, and the graph + temporal store schema.
Quick reference for every OpenCodeHub MCP tool, MCP resource, and the single-file `store.sqlite` schema.

## Always Start Here

Expand Down Expand Up @@ -59,7 +59,7 @@ standalone artifact producer with its own preconditions and output path.
| `mcp__codehub__context` | 360-degree symbol view + `confidenceBreakdown` + `cochanges` side-section |
| `mcp__codehub__impact` | Blast radius with risk tier + `confidenceBreakdown` |
| `mcp__codehub__detect_changes` | Map an uncommitted or committed diff to affected symbols and flows |
| `mcp__codehub__sql` | Read-only query: `sql` arg → temporal DuckDB (cochanges/summaries); `cypher` arg → lbug graph (5 s timeout) |
| `mcp__codehub__sql` | Read-only SQL over the single-file `store.sqlite` (all tables: nodes, edges, embeddings, cochanges, symbol_summaries, store_meta; 5 s timeout). `cypher` arg is reserved for community-fork adapters (unsupported by the default backend) |
| `mcp__codehub__signature` | Symbol declaration + stubbed members (class/interface header + method/property signatures, bodies elided) |

### HTTP / RPC surface
Expand Down Expand Up @@ -115,91 +115,135 @@ Lightweight reads for navigation (every URI uses the `codehub://` scheme):
| `codehub://repo/{name}/context` | Stats + staleness envelope |
| `codehub://repo/{name}/schema` | Live node kinds / relation types for `sql` |

> Cluster and process navigation resources (`codehub://repo/{name}/clusters`, `codehub://repo/{name}/processes`, etc.) are slated for a later wave. Until then, use the typed tools or Cypher (below) filtered to `kind = 'Community'` / `kind = 'Process'`.

## Where the graph lives (ADR 0016)

There are **two stores**, and they are queried differently:

- **Graph tier — `graph.lbug`** (ladybug, Cypher dialect). Holds nodes, edges,
and embeddings. Query it via the typed tools (`query` / `context` / `impact` /
`route_map` / …) or, for bespoke questions, **Cypher** via the MCP `sql`
tool's `cypher` argument. There is NO `nodes` or `relations` SQL table.
- **Temporal tier — `temporal.duckdb`** (DuckDB SQL). Holds only the
`cochanges` and `symbol_summaries` tables. The `sql` argument of the MCP
`sql` tool (and `codehub sql` on the CLI) targets THIS store.

Pass exactly one of `sql` (temporal DuckDB) or `cypher` (lbug graph) to the MCP
`sql` tool.

### Graph schema (lbug / Cypher)

One node label `CodeNode` carrying `kind` as a **property** (NOT a per-kind
label). One relationship table per relation type. Properties are **snake_case**
(`file_path`, `start_line`, `inferred_label`, `step_count`, `entry_point_id`);
a camelCase RETURN alias comes back as the alias you give it, but the stored
property names are snake_case.

**Node kinds** (`n.kind` values): File, Folder, Function, Class, Method,
> Cluster and process navigation resources (`codehub://repo/{name}/clusters`, `codehub://repo/{name}/processes`, etc.) are slated for a later wave. Until then, use the typed tools or a `sql` query (below) filtered to `kind = 'Community'` / `kind = 'Process'`.

## Where the index lives (ADR 0019)

There is **one store**: a single-file `<repo>/.codehub/store.sqlite`
(WAL, via Node's built-in `node:sqlite`). ADR 0019 supersedes ADR 0016:
the old two-tier backend (a `graph.lbug` Ladybug graph plus a
`temporal.duckdb` DuckDB file) is gone. One `SqliteStore` class implements
both the graph and temporal surfaces over that single file.

Everything is directly SQL-queryable through the MCP `sql` tool's `sql`
argument (and `codehub sql` on the CLI):

- **Graph tables (`nodes` and `edges`).** `nodes` holds the typed base
columns plus a `payload` JSON overflow; `edges` is one polymorphic table
keyed by `(src, dst, type, step)`. Query them via the typed tools
(`query` / `context` / `impact` / `route_map` / …) or, for bespoke
questions, plain SQL. Multi-hop traversal is a recursive SQL CTE over
`edges`, NOT Cypher.
- **Embeddings (the `embeddings` table).** Vectors live in a BLOB column;
there is NO Parquet sidecar (it was dropped with DuckDB).
- **Temporal tables (`cochanges` and `symbol_summaries`).** Same file, no
second engine.
- **`store_meta`.** Index metadata (graph hash, timestamps).

Full-text search is BM25 via a SQLite FTS5 virtual table (`nodes_fts`).
The `cypher` argument to the MCP `sql` tool is **reserved for community-fork
graph adapters** (AGE / Memgraph / Neo4j / Neptune) and is **NOT supported
by the default SQLite backend**, so pass `sql` for every query against the
default store.

### Graph schema (`nodes` / `edges` tables)

The `nodes` table carries typed base columns (`id`, `kind`, `name`,
`file_path`, `start_line`, `end_line`) plus a `payload` JSON column holding
every kind-specific field. Reach payload fields with SQLite JSON1:
`payload->>'$.inferredLabel'`, `payload->>'$.stepCount'`,
`payload->>'$.entryPointId'`, `payload->>'$.cohesion'`,
`payload->>'$.symbolCount'`.

The `edges` table is polymorphic: `src`, `dst`, `type`, `confidence`,
`step`, `reason`. The relation kind lives in the `type` column (there is no
per-type table).

**Node kinds** (`kind` values): File, Folder, Function, Class, Method,
Interface, Constructor, Struct, Enum, Macro, Typedef, Union, Namespace, Trait,
Impl, TypeAlias, Const, Static, Variable, Property, Record, Delegate,
Annotation, Template, Module, CodeElement, Community, Process, Route, Tool,
Finding, Dependency, Contributor, Repo, ProjectProfile, Section.

**Relationship types** (each is its own edge label): CONTAINS, DEFINES, IMPORTS,
**Relationship types** (`edges.type` values): CONTAINS, DEFINES, IMPORTS,
CALLS, EXTENDS, IMPLEMENTS, HAS_METHOD, HAS_PROPERTY, ACCESSES, METHOD_OVERRIDES,
OVERRIDES, METHOD_IMPLEMENTS, MEMBER_OF, PROCESS_STEP, HANDLES_ROUTE, FETCHES,
HANDLES_TOOL, ENTRY_POINT_OF, WRAPS, QUERIES, REFERENCES, FOUND_IN, DEPENDS_ON,
OWNED_BY.

Cochanges live only in the **temporal** `cochanges` table (DuckDB SQL), never as
graph edges.
Cochanges live only in the `cochanges` table, never as graph edges.

## Cypher cheat-sheet (MCP `sql` tool, `cypher` arg)
## SQL cheat-sheet (MCP `sql` tool, `sql` arg)

All inbound callers of a function by name:
All inbound callers of a function by name (join `edges` to `nodes` on both
endpoints):

```cypher
MATCH (caller:CodeNode)-[r:CALLS]->(callee:CodeNode)
```sql
SELECT caller.name AS name, caller.file_path AS file,
caller.start_line AS line, e.confidence AS confidence,
e.reason AS reason
FROM edges e
JOIN nodes caller ON caller.id = e.src
JOIN nodes callee ON callee.id = e.dst
WHERE callee.name = 'validateUser' AND callee.kind = 'Function'
RETURN caller.name AS name, caller.file_path AS file, caller.start_line AS line,
r.confidence AS confidence, r.reason AS reason
ORDER BY r.confidence DESC
LIMIT 50
AND e.type = 'CALLS'
ORDER BY e.confidence DESC
LIMIT 50;
```

Top communities by cohesion:
Top communities by cohesion (kind-specific fields via JSON1):

```cypher
MATCH (n:CodeNode)
WHERE n.kind = 'Community'
RETURN n.name AS name, n.inferred_label AS label, n.cohesion AS cohesion,
n.symbol_count AS symbols
ORDER BY n.cohesion DESC
LIMIT 20
```sql
SELECT name,
payload->>'$.inferredLabel' AS label,
payload->>'$.cohesion' AS cohesion,
payload->>'$.symbolCount' AS symbols
FROM nodes
WHERE kind = 'Community'
ORDER BY cohesion DESC
LIMIT 20;
```

Process entry points:

```cypher
MATCH (n:CodeNode)
WHERE n.kind = 'Process'
RETURN n.name AS name, n.inferred_label AS label, n.step_count AS steps,
n.entry_point_id AS entry_point
ORDER BY n.step_count DESC
```sql
SELECT name,
payload->>'$.inferredLabel' AS label,
payload->>'$.stepCount' AS steps,
payload->>'$.entryPointId' AS entry_point
FROM nodes
WHERE kind = 'Process'
ORDER BY steps DESC;
```

SCIP-confirmed CALLS edges only (strict impact):

```cypher
MATCH ()-[r:CALLS]->()
WHERE r.confidence >= 0.95 AND r.reason STARTS WITH 'scip:'
RETURN r
```sql
SELECT * FROM edges
WHERE type = 'CALLS'
AND confidence >= 0.95
AND reason LIKE 'scip:%';
```

Multi-hop blast radius is a recursive CTE over `edges`. The typed `impact`
tool wraps this, so prefer it unless you need a bespoke traversal:

```sql
WITH RECURSIVE reach(id, depth) AS (
SELECT id, 0 FROM nodes WHERE name = 'validateUser'
UNION
SELECT e.src, r.depth + 1
FROM edges e JOIN reach r ON e.dst = r.id
WHERE e.type IN ('CALLS', 'REFERENCES') AND r.depth < 3
)
SELECT DISTINCT n.name, n.file_path, MIN(r.depth) AS depth
FROM reach r JOIN nodes n ON n.id = r.id
GROUP BY n.id ORDER BY depth;
```

### Temporal SQL cheat-sheet (MCP `sql` tool, `sql` arg)
### Co-change cheat-sheet (MCP `sql` tool, `sql` arg)

Tightest co-change pairs (DuckDB SQL — temporal store):
Tightest co-change pairs (`cochanges` table):

```sql
SELECT source_file, target_file, lift, cocommit_count
Expand Down
Loading
Loading