Bug-Finderr · Bug-Finderr · Jun 22, 2026 · Jun 22, 2026 · Jun 22, 2026 · Jun 22, 2026
diff --git a/.gitignore b/.gitignore
@@ -43,3 +43,7 @@ CLAUDE.md
 
 # claude local settings
 .claude/settings.local.json
+
+# python venv for the separate-runner compat tests + caches
+.venv/
+__pycache__/
diff --git a/README.md b/README.md
@@ -1,38 +1,53 @@
 # api-proxy
 
-A single Cloudflare Worker that reverse-proxies the OpenAI, Anthropic, and Google Gemini APIs behind **revocable "doppelganger" tokens**. You issue tokens from an admin dashboard and hand them out; each token is validated server-side and swapped for the real provider key before the request is forwarded. Consumers never see your real keys, and you can scope or revoke any token at any time.
+A single Cloudflare Worker that reverse-proxies the OpenAI, Anthropic, and Google Gemini APIs — over **HTTP and WebSocket** — behind **revocable proxy tokens**. You issue tokens from an admin dashboard and hand them out; each token is validated server-side and swapped for the real provider key before the request is forwarded. Consumers never see your real keys, and you can scope or revoke any token at any time.
 
-The consumer changes only **two things** in their normal SDK: the base URL (point at your worker) and the API key (use a doppelganger token).
+## Use it
 
-## How it works
-
-The doppelganger token rides in the SDK's normal auth slot. The worker reads it, validates it against KV, checks the token is scoped to the requested provider, strips every inbound auth header, sets the one real key, and forwards the request (path + query verbatim, streaming included).
-
-| Token arrives in | Provider | Upstream | Real key set as |
-|---|---|---|---|
-| `Authorization: Bearer` | OpenAI | `api.openai.com` | `Authorization: Bearer` |
-| `x-api-key` | Anthropic | `api.anthropic.com` | `x-api-key` |
-| `x-goog-api-key` / `?key=` | Gemini | `generativelanguage.googleapis.com` | `x-goog-api-key` |
-| `Authorization: Bearer` + path `/v1beta/openai/*` | Gemini (OpenAI-compat) | `generativelanguage.googleapis.com` | `Authorization: Bearer` |
-
-## Client setup
-
-Point the SDK's base URL at the worker and use a doppelganger token as the key:
+Works with the official **OpenAI**, **Anthropic**, and **Google GenAI** SDKs (Python and Node) — and, since the worker routes by auth header and forwards verbatim, with anything that speaks those APIs: the Vercel AI SDK, LangChain, LiteLLM, OpenAI-compatible tools, or raw `curl`. A client changes only **two things**: the base URL and the API key (a proxy token).
 
-| SDK | base URL | key |
+| Client | base URL | API key |
 |---|---|---|
-| OpenAI (Python / Node) | `https://<worker>/v1` | token |
-| Anthropic (Python / Node) | `https://<worker>` (no `/v1`) | token |
-| Google `@google/genai` (Node) | `httpOptions.baseUrl = https://<worker>` | token |
-| Gemini from Python | point the **OpenAI** SDK at `https://<worker>/v1beta/openai` | token |
+| OpenAI SDK (Python / Node) | `https://<worker>/v1` | proxy token |
+| Anthropic SDK (Python / Node) | `https://<worker>` (no `/v1`) | proxy token |
+| Google `@google/genai` (Node) | `httpOptions.baseUrl = https://<worker>` | proxy token |
+| Gemini via the OpenAI SDK | `https://<worker>/v1beta/openai` | proxy token |
+
+```python
+# OpenAI SDK (Python); Node is identical
+from openai import OpenAI
+client = OpenAI(base_url="https://<worker>/v1", api_key="<proxy-token>")
+client.chat.completions.create(
+    model="gpt-5.4", messages=[{"role": "user", "content": "Hello"}])
+```
+
+Or raw HTTP:
 
 ```bash
-# OpenAI-style
 curl https://<worker>/v1/chat/completions \
-  -H "authorization: Bearer <token>" -H "content-type: application/json" \
+  -H "authorization: Bearer <proxy-token>" -H "content-type: application/json" \
   -d '{"model":"gpt-5.4","messages":[{"role":"user","content":"Hello"}]}'
 ```
 
+Browser apps work too — the worker answers the CORS preflight and reflects the request Origin (provider browser opt-ins still apply, e.g. Anthropic's `dangerouslyAllowBrowser`).
+
+### WebSocket / realtime
+
+Realtime sockets proxy the same way — point the WebSocket at the worker and use a proxy token. The worker swaps the token for the real key on the upgrade handshake.
+
+| WebSocket API | URL | token slot |
+|---|---|---|
+| OpenAI Realtime (server) | `wss://<worker>/v1/realtime?model=…` | `Authorization: Bearer <proxy-token>` |
+| OpenAI Realtime (browser) | `wss://<worker>/v1/realtime?model=…` | `Sec-WebSocket-Protocol: realtime, openai-insecure-api-key.<proxy-token>` |
+| OpenAI Responses (WebSocket mode) | `wss://<worker>/v1/responses` | `Authorization: Bearer <proxy-token>` |
+| Gemini Live | `wss://<worker>/ws/…BidiGenerateContent?key=<proxy-token>` | `?key=` query |
+
+A browser can't set the `Authorization` header on a WebSocket, so OpenAI smuggles the key in the `openai-insecure-api-key.` subprotocol — the worker reads it there and re-presents it as a Bearer header upstream. Anthropic has no WebSocket API. A long-lived socket is rate-limited and validated **once at connect**, so a revoke applies to the next connection, not an open stream.
+
+## How it works
+
+The proxy token rides in the SDK's normal auth slot. The worker validates it, checks it's scoped to the requested provider, strips every inbound auth header, sets the one real key, and forwards the request (path + query verbatim, streaming included). Routing is by which auth header the token arrives in — see [docs/architecture.md](docs/architecture.md) for the routing table and full design.
+
 ## Setup
 
 ```bash
@@ -55,41 +70,47 @@ Optional plain vars (NOT secrets) override the upstreams; they default to the re
 
 ## Admin dashboard
 
-Visit `https://<worker>/admin`, sign in with `ADMIN_SECRET`, and create tokens: give each a label, the providers it may use (OpenAI / Anthropic / Gemini), and either type a token or generate one. The token is shown **once** at creation — copy it then; only its SHA-256 hash is stored. Disable or delete any token instantly.
+Visit `https://<worker>/admin`, sign in with `ADMIN_SECRET`, and create tokens: give each a label, the providers it may use (OpenAI / Anthropic / Gemini), an optional expiry, and either type a token or generate one. The token is shown **once** at creation — copy it then; only its SHA-256 hash is stored. Disable or delete any token instantly.
+
+## Per-token controls
+
+- **Expiry** — optionally set an expiry at creation; past it the token is rejected and the dashboard shows it as `expired`.
+- **Rate limit** — each token is capped at 100 requests / 60s (`429` + `Retry-After` over the limit). Tune `[[ratelimits]]` in `wrangler.toml`. It is a per-colo, loose ceiling for abuse protection, not a strict quota.
+- **Scope & revoke** — a token only reaches the providers you check; disable or delete to revoke (KV propagation is up to ~60s).
 
 ## Security
 
 - Real provider keys are Cloudflare secrets, injected only into outbound requests — never in KV, never returned to callers.
 - Tokens are stored as SHA-256 hashes; a KV/dashboard dump yields unusable hashes, not live tokens.
-- The worker strips all inbound auth headers before setting the real key, so a doppelganger token is never forwarded upstream.
+- The worker strips all inbound auth headers before setting the real key, so a proxy token is never forwarded upstream.
 - Do not host the worker on a `*.openai.azure.com` / `*.cognitiveservices.azure.com` domain (the OpenAI SDK switches to Azure auth on those hostnames).
 
 ## Testing
 
-Two tiers (Vitest):
-
 ```bash
 nub run test:unit     # tier 1: proxy logic in workerd (vitest-pool-workers), fast CI gate
-nub run test:compat   # tier 2: real openai / @anthropic-ai/sdk / @google/genai SDKs vs a local worker + mock upstream
-nub run test          # both
+nub run test:compat   # tier 2: real client libs (official SDKs, Vercel AI SDK, LangChain, Genkit) + raw fetch + a real wss round-trip vs a mock upstream
+nub run test:py       # tier 2 (Python): LiteLLM, LlamaIndex, instructor, Pydantic AI through the worker (needs the venv below)
+nub run test          # all of the above
 ```
 
-Tier 2 starts the real worker (`unstable_dev`) with `*_UPSTREAM` pointed at a `node:http` mock, seeds a token via the admin API, then drives each real SDK and asserts the forwarded request carries the real key (and never the token).
+Tier 2 starts the real worker (`unstable_dev`) with `*_UPSTREAM` pointed at a `node:http` mock, seeds a token via the admin API, drives each real client, and asserts the forwarded request carries the real key (and never the token). The Python runner (`test/run-py.mjs`) owns the same worker + mock and runs each `*.py` as a thin client. **Each file in `test/sdk-compat/` is named after the package it drives and doubles as a usage example** — copy the `baseURL`/`apiKey` wiring from the file matching your client (e.g. `ai-sdk-openai.ts`, `langchain-anthropic.ts`, `genkit.ts`, `pydantic-ai.py`), from `fetch.ts` for raw HTTP, or from `websocket.ts` for a wss client.
 
-## Disable / Enable
+**What's tested, and what's by-construction.** The worker routes by *which auth slot a request uses*, not by SDK — so a provider's packages behave identically once the slot is fixed. We therefore test **each distinct library once, in one language** — the official `openai` / `@anthropic-ai/sdk` / `@google/genai` SDKs, the Vercel AI SDK, LangChain, Genkit, LiteLLM, LlamaIndex, instructor, and Pydantic AI (see `test/sdk-compat/`) — and treat the rest as compatible-by-construction: a tested SDK's other-language packages (`openai-python`/`-go`/`-java`/...), end-user apps (Aider, Cline, Continue, Open WebUI), and JVM/.NET frameworks (Spring AI, Semantic Kernel) each reuse a slot already proven. The per-provider proof matrix is in [docs/learnings/compat-is-the-auth-slot-not-the-sdk.md](docs/learnings/compat-is-the-auth-slot-not-the-sdk.md). Two gotchas: use Anthropic's normal API-key mode (its OAuth `authToken` mode sends `Bearer`, which would route to OpenAI), and the legacy `google-generativeai` Python SDK needs `transport="rest"` (it defaults to gRPC and won't traverse an HTTP proxy otherwise).
 
-`schedule.sh` toggles the worker's `workers_dev` URL without deleting it:
+The Python runner uses a local venv. One-time setup with [uv](https://docs.astral.sh/uv/):
 
 ```bash
-./schedule.sh disable          # now
-./schedule.sh disable +30m     # in 30 minutes
-./schedule.sh enable 22:00     # at 10pm
+uv venv
+uv pip install -r test/requirements.txt
 ```
 
+> **Gemini is untested with the actual API.** No test hits a live provider — all three run against a mock upstream. OpenAI and Anthropic are additionally verified live in deployment; Gemini is **not**, because `GEMINI_API_KEY` isn't set yet, so the Gemini route has never run against the real Google Generative Language API. Treat it as built-but-unproven until a key is added.
+
 ## Cost
 
 Cloudflare Workers free tier covers this (100k requests/day). You only pay upstream providers for API usage.
 
 ## Contributing
 
-Issues are welcome. PRs are not accepted and will be auto-closed.
+Issues are welcome. External PRs are not accepted and will be auto-closed.
diff --git a/schedule.sh → _legacy/schedule.sh b/schedule.sh → _legacy/schedule.sh
@@ -10,9 +10,9 @@ for arg in "$@"; do
     *) time_args+=("$arg") ;;
   esac
 done
-# Default target is the single token-gated worker (wrangler.toml). The provider flags
-# still target the legacy per-provider workers during the transition.
-config="${label:+wrangler.${label}.toml}"; config="${config:-wrangler.toml}"
+# Archived helper (kept aside in _legacy/). Default target is the active token-gated worker at the
+# repo root (wrangler.toml); the provider flags target the archived v1 workers in _legacy/v1/.
+config="${label:+_legacy/v1/wrangler.${label}.toml}"; config="${config:-wrangler.toml}"
 name="${label:-api-proxy}"
 time_arg="${time_args[*]:-}"
 
@@ -21,7 +21,7 @@ time_arg="${time_args[*]:-}"
 }
 
 [[ "$action" == "enable" ]] && from=false to=true || from=true to=false
-dir=$(cd "$(dirname "$0")" && pwd)
+dir=$(cd "$(dirname "$0")/.." && pwd) # repo root (this script lives in _legacy/)
 
 if [[ -z "$time_arg" ]]; then
   sed -i '' "s/workers_dev = $from/workers_dev = $to/" "$dir/$config"

diff --git a/_legacy/v1/README.md b/_legacy/v1/README.md
@@ -0,0 +1,35 @@
+# v1 - one worker per provider (archived)
+
+The original proxy: **three separate Workers**, one per provider, each a thin pass-through
+that swaps the hostname and injects the real key from its own secret.
+
+```
+client ──▶ openai-proxy  ──(Bearer OPENAI_API_KEY)──▶ api.openai.com
+client ──▶ claude-proxy  ──(x-api-key ANTHROPIC_KEY)─▶ api.anthropic.com
+client ──▶ gemini-proxy  ──(x-goog-api-key GEMINI)──▶ generativelanguage.googleapis.com
+```
+
+| File | Worker | Upstream | Key slot it sets |
+|---|---|---|---|
+| `openai.ts` / `wrangler.openai.toml` | `openai-proxy` | api.openai.com | `Authorization: Bearer` |
+| `claude.ts` / `wrangler.claude.toml` | `claude-proxy` | api.anthropic.com | `x-api-key` |
+| `gemini.ts` / `wrangler.gemini.toml` | `gemini-proxy` | generativelanguage.googleapis.com | `x-goog-api-key` (and strips `?key=`) |
+
+## Why it was replaced
+
+- **No auth.** Each worker injected the real upstream key for *any* caller. Anyone who knew
+  the URL spent the key. There were no shareable, revocable tokens.
+- **Three deploys, three URLs.** Clients had to know which worker maps to which provider, and
+  each needed its own secret and deploy.
+
+v2 (the active root worker) collapses all three into **one** worker that routes by auth header,
+gates every request behind a hashed [proxy token](../../docs/learnings/proxy-token-security.md),
+and adds the [OpenAI geo-403 egress fix](../../docs/learnings/openai-egress-geo-block.md). See
+[provider routing by auth header](../../docs/learnings/provider-routing-by-auth-header.md) for how
+one base URL serves all three.
+
+## Status
+
+Reference only - **not deployed, not built, not tested.** Kept to document where the project
+started. The `main` paths in these tomls point at files in this folder, so each could still be
+deployed standalone (`wrangler deploy -c _legacy/v1/wrangler.openai.toml`) if ever needed.
diff --git a/src/claude.ts → _legacy/v1/claude.ts b/src/claude.ts → _legacy/v1/claude.ts
diff --git a/src/gemini.ts → _legacy/v1/gemini.ts b/src/gemini.ts → _legacy/v1/gemini.ts
diff --git a/src/openai.ts → _legacy/v1/openai.ts b/src/openai.ts → _legacy/v1/openai.ts
diff --git a/wrangler.claude.toml → _legacy/v1/wrangler.claude.toml b/wrangler.claude.toml → _legacy/v1/wrangler.claude.toml
@@ -1,5 +1,5 @@
 name = "claude-proxy"
-main = "src/claude.ts"
+main = "claude.ts"
 compatibility_date = "2025-01-01"
 workers_dev = true
 preview_urls = false
diff --git a/wrangler.gemini.toml → _legacy/v1/wrangler.gemini.toml b/wrangler.gemini.toml → _legacy/v1/wrangler.gemini.toml
@@ -1,5 +1,5 @@
 name = "gemini-proxy"
-main = "src/gemini.ts"
+main = "gemini.ts"
 compatibility_date = "2025-01-01"
 workers_dev = true
 preview_urls = false
diff --git a/wrangler.openai.toml → _legacy/v1/wrangler.openai.toml b/wrangler.openai.toml → _legacy/v1/wrangler.openai.toml
@@ -1,5 +1,5 @@
 name = "openai-proxy"
-main = "src/openai.ts"
+main = "openai.ts"
 compatibility_date = "2025-01-01"
 workers_dev = true
 preview_urls = false
diff --git a/biome.json b/biome.json
@@ -1,5 +1,5 @@
 {
-  "$schema": "https://biomejs.dev/schemas/2.5.0/schema.json",
+  "$schema": "https://biomejs.dev/schemas/2.5.1/schema.json",
   "vcs": {
     "enabled": true,
     "clientKind": "git",