A self-hosted OTLP observability platform in a single Go binary — OTLP gRPC + HTTP ingest, GraphRAG-powered root-cause analysis, multi-tenant storage, and a built-in MCP server for AI agents.
For teams who want traces, logs, and metrics in one place without standing up a Collector + Prometheus + Loki + Tempo stack.
- Operators:
docs/OPERATIONS.md— first run, production checklist, backup, incident response, upgrades. - AI agents / contributors:
CLAUDE.md— architecture, GraphRAG, MCP tools, conventions. - Env reference:
.env.example— every supported environment variable with defaults.
# 1. Build
go build -o otelcontext .
# 2. Run with an API key (dev-friendly — SQLite, plaintext HTTP)
export API_KEY="$(openssl rand -hex 32)"
./otelcontextThe server listens on:
- OTLP gRPC:
:4317 - HTTP API + OTLP HTTP + UI + MCP:
:8080 - Prometheus:
:8080/metrics/prometheus - Probes:
:8080/live,:8080/ready
Send an OTLP log via HTTP:
curl -sS -X POST http://localhost:8080/v1/logs \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"resourceLogs": [{
"resource": {"attributes": [{"key": "service.name", "value": {"stringValue": "demo"}}]},
"scopeLogs": [{"logRecords": [{
"timeUnixNano": "'$(date +%s)000000000'",
"severityText": "INFO",
"body": {"stringValue": "hello otelcontext"}
}]}]
}]
}'Then query it back:
curl -sS -H "Authorization: Bearer $API_KEY" \
"http://localhost:8080/api/logs?limit=5" | jq .Default is SQLite (otelcontext.db in the working dir). Override via env vars:
# PostgreSQL
DB_DRIVER=postgres \
DB_DSN="host=localhost user=otel password=otel dbname=otelcontext port=5432 sslmode=disable" \
./otelcontext
# MySQL
DB_DRIVER=mysql \
DB_DSN="root:password@tcp(localhost:3306)/otelcontext?charset=utf8mb4&parseTime=True&loc=Local" \
./otelcontextSee .env.example for SQL Server and Azure Entra (passwordless Postgres) configurations.
OtelContext auto-tunes itself by driver. The numbers below assume the
auto-flipped SQLite defaults (5% sampling baseline, STORE_MIN_SEVERITY=WARN,
3k metric cardinality cap, FTS5 enabled, 1 SQLite writer with WAL + 256 MB
page cache + 1 GB mmap). Postgres keeps the looser defaults.
| Workload | DB | Steady RSS | Notes |
|---|---|---|---|
| Dev / <10 services | SQLite | <500 MB | Default config; no tuning needed. |
| 50–120 services, 7-day retention | SQLite (auto-tuned) or Postgres | ~1.8 GB | SQLite survives this band on the auto-flipped defaults. |
| >120 services, or >7-day retention, or sustained 50+ writes/sec | Postgres | depends on host | SQLite's single-writer serialization becomes the bottleneck. |
OTELCONTEXT_ALLOW_SQLITE_PROD=false is the guardrail — APP_ENV=production with DB_DRIVER=sqlite refuses to start unless the operator opts in.
See CLAUDE.md "SQLite per-driver defaults" for the full
table of which env vars get auto-overridden on SQLite, and the rationale
per entry.
OtelContext accepts OTLP gRPC on :4317 and OTLP HTTP on :8080/v1/{traces,logs,metrics}. Point any OpenTelemetry Collector (or SDK) at it:
exporters:
otlp/otelcontext:
endpoint: "localhost:4317"
tls:
insecure: true
service:
pipelines:
traces:
exporters: [otlp/otelcontext]
logs:
exporters: [otlp/otelcontext]
metrics:
exporters: [otlp/otelcontext]See docs/otel-collector-example.yaml for a complete example.
- OTLP gRPC + HTTP ingest — traces, logs, metrics; gzip and protobuf/JSON supported. Hybrid backpressure (90% soft-drop, 100% reject) prevents queue OOMs.
- GraphRAG — layered in-memory graph with
error_chain,impact_analysis,root_cause_analysis, and anomaly-correlation queries. - Drain log clustering — deterministic template mining, persisted across restarts.
- MCP server — 7-tool triage surface for AI agents over JSON-RPC 2.0 + SSE:
get_anomaly_timeline,get_service_map,get_service_health,root_cause_analysis,impact_analysis,trace_graph,search_logs. Per-call deadlines, concurrency semaphore, 5 s TTL cache for cheap in-memory tools, SSE keep-alives every 25 s. - Log search — SQLite FTS5 (BM25-ranked) on by default;
pg_trgmGIN on Postgres; LIKE fallback.search_logsis 24-hour-capped to bound the worst-case scan. - Multi-tenancy — per-row
tenant_id,X-Tenant-IDheader /x-tenant-idgRPC metadata, per-tenant cardinality caps. - Adaptive sampling — always-on for errors and slow spans, probabilistic otherwise (defaults to 5 % on SQLite, 100 % on Postgres).
- Auto-tuned SQLite path — fail-closed PRAGMA stanza (WAL, NORMAL sync, 256 MB cache, 1 GB mmap, 64 MB WAL cap) + 9 per-driver config defaults so single-binary deploys survive 120 services on a 4 GB host.
- DLQ — durable typed envelopes with disk-bounded replay.
- Self-instrumentation — export OtelContext's own spans via
OTEL_EXPORTER_OTLP_ENDPOINT. Loopback guard prevents recursive feedback.
See SECURITY.md for the vulnerability reporting process. The security posture (OSV-Scanner, Trivy, Semgrep, Gitleaks, jscpd, SBOM, Scorecard) is described in CLAUDE.md under "Security & Supply Chain".
See LICENSE.md.