I build systems that make AI agents more reliable: memory, coordination, verification, model routing, and the little workflows that keep agents from confidently wandering into a wall.
The short version: three companies, six hackathon wins, 47 public GitHub repositories, and a current obsession with agent infrastructure that proves its own claims.
| KERNEL | Claude Code that learns from itself. Persistent SQLite memory, multi-agent orchestration, validation gates, on-demand skills, and an experiment engine for proving which rules actually work. |
| llm-bench | Practical workflow benchmarks for local and API-hosted language models, with programmatic verifiers and provider adapters. |
| model-familiarity-engine | Evidence-backed model cards from replayed known-outcome tasks and observed model behavior. |
| the-agent-library | A curated library of 34 portable skills for Claude, Codex, and other agents: verification, planning, research, writing, work management, code engineering, and shipping. |
| metabrain | Zero-dependency SQLite memory for agents. Patterns graduate into hypotheses; outcomes become experiments; proven lessons become preferences. |
| Substrate | Generative art gallery where Claude generates one self-contained interactive HTML piece per run through daily automation — in the lineage of Taper. Live at nexus-substrate.pages.dev, 390+ pieces and counting. |
| latent-diagnostics | Representation-level analysis of LLMs via attribution graph geometry. Preserves both real task-domain signals and negative results that did not survive controls. |
Agents need evidence, not vibes.
Most AI systems still run like this: ask the model, hope the answer is good, iterate until it sounds right. I build the opposite shape:
- Memory that compounds: agent lessons stored as structured evidence, not loose notes.
- Contracts before code: scope, acceptance criteria, and failure modes written before implementation.
- Verification as default: tests, lint, adversarial review, and live checks before saying done.
- Benchmarks with verifiers: model comparison should be boring, reproducible, and inspectable.
- Model familiarity: models earn responsibilities through observed work, not benchmark gossip.
- Portable workflows: reusable skills that make agent behavior legible across tools.
The agent is not magic. The system around it can be.
| latent-diagnostics | Measures computational regimes inside model internals. Strongest result: task-domain geometry survives length control; truthfulness detection did not. |
| universal-spectroscopy-engine | Spectroscopy-inspired local engine for diagnosing semantic drift and model blindness. |
| experiments | Append-only specimen archive for LLM experiments. |
| arbiter | Propositional logic validation and compression library. |
| ModelMind | Duolingo-style app for understanding how AI works, not how to prompt it. In beta on TestFlight and Google Play internal testing. |
| Brink Mind | iOS journaling, private AI conversation, and biometric insights. SwiftUI + HealthKit. |
| HeyContext | Multi-agent orchestration workspace with adaptive model/config selection and shared context. |
| HeyContent | Cross-platform memory architecture for creator context, later integrated into HeyContext. |
| HotAgents | Hotkey-triggered desktop agent that reads screenshots and helps explain, draft, code, and proofread. |
| Itinerator | AI-powered itinerary generator built with React, Firebase, and Cloudflare Pages. |
| Freetime | AI agents for coordinating social plans across interests, timing, and location. |
| conductor | MCP server bridging Claude Desktop and Claude Code. |
| kernel-cursor | KERNEL patterns adapted toward Cursor workflows. |
| armature-ai | AI optimization framework using evolutionary algorithms, bandits, and agent societies. |
| event-horizon | Physics-informed chaos-gated data vault experiments. |
| memory-pool | Structured persistent-memory interface. Memory is not a timeline. |
| go-voice | Voice-native CLI for Claude Code: speech-to-text input and text-to-speech output. |
Currently writing Intelligence Architecture, a principles-first book on building with AI.
Selected essays:
- Stop Copying Other People's AI Setups. Build One That's Actually Yours.
- What an AI Detector Actually Measures
- The Agent-Ready Web
- Self-Learning Agent Civilization
- How to Make Claude Code Actually Work
- Why Prompt Engineering Can't Fix Hallucinations
Python · TypeScript · Swift · Go
Next.js · React Native · FastAPI · SvelteKit
Claude Code · SQLite · SAEs · evals · multi-agent systems



