Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
b2dad21
chore: rename token concept to "proxy token", archive v1 proxies, pre…
Bug-Finderr Jun 22, 2026
60445bd
feat: per-token expiry dates (check-at-validate)
Bug-Finderr Jun 22, 2026
7d05d45
feat: CORS preflight + reflect-Origin so browser SDKs work
Bug-Finderr Jun 22, 2026
4c928a3
feat: per-token RPM rate limiting (Workers Rate Limiting binding)
Bug-Finderr Jun 22, 2026
bfe4d3a
chore: address handoff items - archive schedule.sh, dashboard auto-re…
Bug-Finderr Jun 22, 2026
bdd0f02
test: rename leftover DOPPEL fixtures to PROXY-TOKEN
Bug-Finderr Jun 22, 2026
1074a1c
docs: add docs/architecture.md, capture v2.1 learnings, drop legacy f…
Bug-Finderr Jun 22, 2026
5df93ab
docs: replace broken ASCII diagrams in architecture.md with clean mar…
Bug-Finderr Jun 22, 2026
862c404
docs: consumer-first README + cleaner architecture.md
Bug-Finderr Jun 22, 2026
e084ce2
docs: drop §17 repo layout, keep the 5df93ab diagram formatting
Bug-Finderr Jun 22, 2026
39f4076
test: add raw-fetch + LiteLLM compat coverage; name sdk-compat files …
Bug-Finderr Jun 23, 2026
3f9abf8
docs: tighten architecture.md; add CORS-preflight/upload-passthrough …
Bug-Finderr Jun 23, 2026
f7a3af6
refactor(test): name sdk-compat files after their packages; move requ…
Bug-Finderr Jun 23, 2026
b3ea890
docs: prove SDK compat is the auth slot, not the SDK/language
Bug-Finderr Jun 23, 2026
71c1fdf
docs: fix review findings in compat learning
Bug-Finderr Jun 23, 2026
c86f499
docs: make the per-provider anchor test explicit in the compat learning
Bug-Finderr Jun 23, 2026
8369fd1
test: add per-library compat tests; fix test:py harness (no orphaned …
Bug-Finderr Jun 23, 2026
c2c334e
test: address review - pin python deps, harden py harness
Bug-Finderr Jun 23, 2026
48bc5d0
docs: use uv for the Python test venv setup (replaces pip)
Bug-Finderr Jun 23, 2026
01b8c6f
chore(test): bump python compat deps to latest
Bug-Finderr Jun 23, 2026
dcfb71c
feat: proxy WebSocket (wss) upgrades with the same token swap
Bug-Finderr Jul 3, 2026
ab529b3
docs: rm 100 col limitation
Bug-Finderr Jul 3, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -43,3 +43,7 @@ CLAUDE.md

# claude local settings
.claude/settings.local.json

# python venv for the separate-runner compat tests + caches
.venv/
__pycache__/
93 changes: 57 additions & 36 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,38 +1,53 @@
# api-proxy

A single Cloudflare Worker that reverse-proxies the OpenAI, Anthropic, and Google Gemini APIs behind **revocable "doppelganger" tokens**. You issue tokens from an admin dashboard and hand them out; each token is validated server-side and swapped for the real provider key before the request is forwarded. Consumers never see your real keys, and you can scope or revoke any token at any time.
A single Cloudflare Worker that reverse-proxies the OpenAI, Anthropic, and Google Gemini APIs — over **HTTP and WebSocket** — behind **revocable proxy tokens**. You issue tokens from an admin dashboard and hand them out; each token is validated server-side and swapped for the real provider key before the request is forwarded. Consumers never see your real keys, and you can scope or revoke any token at any time.

The consumer changes only **two things** in their normal SDK: the base URL (point at your worker) and the API key (use a doppelganger token).
## Use it

## How it works

The doppelganger token rides in the SDK's normal auth slot. The worker reads it, validates it against KV, checks the token is scoped to the requested provider, strips every inbound auth header, sets the one real key, and forwards the request (path + query verbatim, streaming included).

| Token arrives in | Provider | Upstream | Real key set as |
|---|---|---|---|
| `Authorization: Bearer` | OpenAI | `api.openai.com` | `Authorization: Bearer` |
| `x-api-key` | Anthropic | `api.anthropic.com` | `x-api-key` |
| `x-goog-api-key` / `?key=` | Gemini | `generativelanguage.googleapis.com` | `x-goog-api-key` |
| `Authorization: Bearer` + path `/v1beta/openai/*` | Gemini (OpenAI-compat) | `generativelanguage.googleapis.com` | `Authorization: Bearer` |

## Client setup

Point the SDK's base URL at the worker and use a doppelganger token as the key:
Works with the official **OpenAI**, **Anthropic**, and **Google GenAI** SDKs (Python and Node) — and, since the worker routes by auth header and forwards verbatim, with anything that speaks those APIs: the Vercel AI SDK, LangChain, LiteLLM, OpenAI-compatible tools, or raw `curl`. A client changes only **two things**: the base URL and the API key (a proxy token).

| SDK | base URL | key |
| Client | base URL | API key |
|---|---|---|
| OpenAI (Python / Node) | `https://<worker>/v1` | token |
| Anthropic (Python / Node) | `https://<worker>` (no `/v1`) | token |
| Google `@google/genai` (Node) | `httpOptions.baseUrl = https://<worker>` | token |
| Gemini from Python | point the **OpenAI** SDK at `https://<worker>/v1beta/openai` | token |
| OpenAI SDK (Python / Node) | `https://<worker>/v1` | proxy token |
| Anthropic SDK (Python / Node) | `https://<worker>` (no `/v1`) | proxy token |
| Google `@google/genai` (Node) | `httpOptions.baseUrl = https://<worker>` | proxy token |
| Gemini via the OpenAI SDK | `https://<worker>/v1beta/openai` | proxy token |

```python
# OpenAI SDK (Python); Node is identical
from openai import OpenAI
client = OpenAI(base_url="https://<worker>/v1", api_key="<proxy-token>")
client.chat.completions.create(
model="gpt-5.4", messages=[{"role": "user", "content": "Hello"}])
```

Or raw HTTP:

```bash
# OpenAI-style
curl https://<worker>/v1/chat/completions \
-H "authorization: Bearer <token>" -H "content-type: application/json" \
-H "authorization: Bearer <proxy-token>" -H "content-type: application/json" \
-d '{"model":"gpt-5.4","messages":[{"role":"user","content":"Hello"}]}'
```

Browser apps work too — the worker answers the CORS preflight and reflects the request Origin (provider browser opt-ins still apply, e.g. Anthropic's `dangerouslyAllowBrowser`).

### WebSocket / realtime

Realtime sockets proxy the same way — point the WebSocket at the worker and use a proxy token. The worker swaps the token for the real key on the upgrade handshake.

| WebSocket API | URL | token slot |
|---|---|---|
| OpenAI Realtime (server) | `wss://<worker>/v1/realtime?model=…` | `Authorization: Bearer <proxy-token>` |
| OpenAI Realtime (browser) | `wss://<worker>/v1/realtime?model=…` | `Sec-WebSocket-Protocol: realtime, openai-insecure-api-key.<proxy-token>` |
| OpenAI Responses (WebSocket mode) | `wss://<worker>/v1/responses` | `Authorization: Bearer <proxy-token>` |
| Gemini Live | `wss://<worker>/ws/…BidiGenerateContent?key=<proxy-token>` | `?key=` query |

A browser can't set the `Authorization` header on a WebSocket, so OpenAI smuggles the key in the `openai-insecure-api-key.` subprotocol — the worker reads it there and re-presents it as a Bearer header upstream. Anthropic has no WebSocket API. A long-lived socket is rate-limited and validated **once at connect**, so a revoke applies to the next connection, not an open stream.

## How it works

The proxy token rides in the SDK's normal auth slot. The worker validates it, checks it's scoped to the requested provider, strips every inbound auth header, sets the one real key, and forwards the request (path + query verbatim, streaming included). Routing is by which auth header the token arrives in — see [docs/architecture.md](docs/architecture.md) for the routing table and full design.

## Setup

```bash
Expand All @@ -55,41 +70,47 @@ Optional plain vars (NOT secrets) override the upstreams; they default to the re

## Admin dashboard

Visit `https://<worker>/admin`, sign in with `ADMIN_SECRET`, and create tokens: give each a label, the providers it may use (OpenAI / Anthropic / Gemini), and either type a token or generate one. The token is shown **once** at creation — copy it then; only its SHA-256 hash is stored. Disable or delete any token instantly.
Visit `https://<worker>/admin`, sign in with `ADMIN_SECRET`, and create tokens: give each a label, the providers it may use (OpenAI / Anthropic / Gemini), an optional expiry, and either type a token or generate one. The token is shown **once** at creation — copy it then; only its SHA-256 hash is stored. Disable or delete any token instantly.

## Per-token controls

- **Expiry** — optionally set an expiry at creation; past it the token is rejected and the dashboard shows it as `expired`.
- **Rate limit** — each token is capped at 100 requests / 60s (`429` + `Retry-After` over the limit). Tune `[[ratelimits]]` in `wrangler.toml`. It is a per-colo, loose ceiling for abuse protection, not a strict quota.
- **Scope & revoke** — a token only reaches the providers you check; disable or delete to revoke (KV propagation is up to ~60s).

## Security

- Real provider keys are Cloudflare secrets, injected only into outbound requests — never in KV, never returned to callers.
- Tokens are stored as SHA-256 hashes; a KV/dashboard dump yields unusable hashes, not live tokens.
- The worker strips all inbound auth headers before setting the real key, so a doppelganger token is never forwarded upstream.
- The worker strips all inbound auth headers before setting the real key, so a proxy token is never forwarded upstream.
- Do not host the worker on a `*.openai.azure.com` / `*.cognitiveservices.azure.com` domain (the OpenAI SDK switches to Azure auth on those hostnames).

## Testing

Two tiers (Vitest):

```bash
nub run test:unit # tier 1: proxy logic in workerd (vitest-pool-workers), fast CI gate
nub run test:compat # tier 2: real openai / @anthropic-ai/sdk / @google/genai SDKs vs a local worker + mock upstream
nub run test # both
nub run test:compat # tier 2: real client libs (official SDKs, Vercel AI SDK, LangChain, Genkit) + raw fetch + a real wss round-trip vs a mock upstream
nub run test:py # tier 2 (Python): LiteLLM, LlamaIndex, instructor, Pydantic AI through the worker (needs the venv below)
nub run test # all of the above
```

Tier 2 starts the real worker (`unstable_dev`) with `*_UPSTREAM` pointed at a `node:http` mock, seeds a token via the admin API, then drives each real SDK and asserts the forwarded request carries the real key (and never the token).
Tier 2 starts the real worker (`unstable_dev`) with `*_UPSTREAM` pointed at a `node:http` mock, seeds a token via the admin API, drives each real client, and asserts the forwarded request carries the real key (and never the token). The Python runner (`test/run-py.mjs`) owns the same worker + mock and runs each `*.py` as a thin client. **Each file in `test/sdk-compat/` is named after the package it drives and doubles as a usage example** — copy the `baseURL`/`apiKey` wiring from the file matching your client (e.g. `ai-sdk-openai.ts`, `langchain-anthropic.ts`, `genkit.ts`, `pydantic-ai.py`), from `fetch.ts` for raw HTTP, or from `websocket.ts` for a wss client.

## Disable / Enable
**What's tested, and what's by-construction.** The worker routes by *which auth slot a request uses*, not by SDK — so a provider's packages behave identically once the slot is fixed. We therefore test **each distinct library once, in one language** — the official `openai` / `@anthropic-ai/sdk` / `@google/genai` SDKs, the Vercel AI SDK, LangChain, Genkit, LiteLLM, LlamaIndex, instructor, and Pydantic AI (see `test/sdk-compat/`) — and treat the rest as compatible-by-construction: a tested SDK's other-language packages (`openai-python`/`-go`/`-java`/...), end-user apps (Aider, Cline, Continue, Open WebUI), and JVM/.NET frameworks (Spring AI, Semantic Kernel) each reuse a slot already proven. The per-provider proof matrix is in [docs/learnings/compat-is-the-auth-slot-not-the-sdk.md](docs/learnings/compat-is-the-auth-slot-not-the-sdk.md). Two gotchas: use Anthropic's normal API-key mode (its OAuth `authToken` mode sends `Bearer`, which would route to OpenAI), and the legacy `google-generativeai` Python SDK needs `transport="rest"` (it defaults to gRPC and won't traverse an HTTP proxy otherwise).

`schedule.sh` toggles the worker's `workers_dev` URL without deleting it:
The Python runner uses a local venv. One-time setup with [uv](https://docs.astral.sh/uv/):

```bash
./schedule.sh disable # now
./schedule.sh disable +30m # in 30 minutes
./schedule.sh enable 22:00 # at 10pm
uv venv
uv pip install -r test/requirements.txt
```

> **Gemini is untested with the actual API.** No test hits a live provider — all three run against a mock upstream. OpenAI and Anthropic are additionally verified live in deployment; Gemini is **not**, because `GEMINI_API_KEY` isn't set yet, so the Gemini route has never run against the real Google Generative Language API. Treat it as built-but-unproven until a key is added.

## Cost

Cloudflare Workers free tier covers this (100k requests/day). You only pay upstream providers for API usage.

## Contributing

Issues are welcome. PRs are not accepted and will be auto-closed.
Issues are welcome. External PRs are not accepted and will be auto-closed.
8 changes: 4 additions & 4 deletions schedule.sh → _legacy/schedule.sh
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,9 @@ for arg in "$@"; do
*) time_args+=("$arg") ;;
esac
done
# Default target is the single token-gated worker (wrangler.toml). The provider flags
# still target the legacy per-provider workers during the transition.
config="${label:+wrangler.${label}.toml}"; config="${config:-wrangler.toml}"
# Archived helper (kept aside in _legacy/). Default target is the active token-gated worker at the
# repo root (wrangler.toml); the provider flags target the archived v1 workers in _legacy/v1/.
config="${label:+_legacy/v1/wrangler.${label}.toml}"; config="${config:-wrangler.toml}"
name="${label:-api-proxy}"
time_arg="${time_args[*]:-}"

Expand All @@ -21,7 +21,7 @@ time_arg="${time_args[*]:-}"
}

[[ "$action" == "enable" ]] && from=false to=true || from=true to=false
dir=$(cd "$(dirname "$0")" && pwd)
dir=$(cd "$(dirname "$0")/.." && pwd) # repo root (this script lives in _legacy/)

if [[ -z "$time_arg" ]]; then
sed -i '' "s/workers_dev = $from/workers_dev = $to/" "$dir/$config"
Expand Down
35 changes: 35 additions & 0 deletions _legacy/v1/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# v1 - one worker per provider (archived)

The original proxy: **three separate Workers**, one per provider, each a thin pass-through
that swaps the hostname and injects the real key from its own secret.

```
client ──▶ openai-proxy ──(Bearer OPENAI_API_KEY)──▶ api.openai.com
client ──▶ claude-proxy ──(x-api-key ANTHROPIC_KEY)─▶ api.anthropic.com
client ──▶ gemini-proxy ──(x-goog-api-key GEMINI)──▶ generativelanguage.googleapis.com
```

| File | Worker | Upstream | Key slot it sets |
|---|---|---|---|
| `openai.ts` / `wrangler.openai.toml` | `openai-proxy` | api.openai.com | `Authorization: Bearer` |
| `claude.ts` / `wrangler.claude.toml` | `claude-proxy` | api.anthropic.com | `x-api-key` |
| `gemini.ts` / `wrangler.gemini.toml` | `gemini-proxy` | generativelanguage.googleapis.com | `x-goog-api-key` (and strips `?key=`) |

## Why it was replaced

- **No auth.** Each worker injected the real upstream key for *any* caller. Anyone who knew
the URL spent the key. There were no shareable, revocable tokens.
- **Three deploys, three URLs.** Clients had to know which worker maps to which provider, and
each needed its own secret and deploy.

v2 (the active root worker) collapses all three into **one** worker that routes by auth header,
gates every request behind a hashed [proxy token](../../docs/learnings/proxy-token-security.md),
and adds the [OpenAI geo-403 egress fix](../../docs/learnings/openai-egress-geo-block.md). See
[provider routing by auth header](../../docs/learnings/provider-routing-by-auth-header.md) for how
one base URL serves all three.

## Status

Reference only - **not deployed, not built, not tested.** Kept to document where the project
started. The `main` paths in these tomls point at files in this folder, so each could still be
deployed standalone (`wrangler deploy -c _legacy/v1/wrangler.openai.toml`) if ever needed.
File renamed without changes.
File renamed without changes.
File renamed without changes.
2 changes: 1 addition & 1 deletion wrangler.claude.toml → _legacy/v1/wrangler.claude.toml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
name = "claude-proxy"
main = "src/claude.ts"
main = "claude.ts"
compatibility_date = "2025-01-01"
workers_dev = true
preview_urls = false
2 changes: 1 addition & 1 deletion wrangler.gemini.toml → _legacy/v1/wrangler.gemini.toml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
name = "gemini-proxy"
main = "src/gemini.ts"
main = "gemini.ts"
compatibility_date = "2025-01-01"
workers_dev = true
preview_urls = false
2 changes: 1 addition & 1 deletion wrangler.openai.toml → _legacy/v1/wrangler.openai.toml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
name = "openai-proxy"
main = "src/openai.ts"
main = "openai.ts"
compatibility_date = "2025-01-01"
workers_dev = true
preview_urls = false
2 changes: 1 addition & 1 deletion biome.json
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{
"$schema": "https://biomejs.dev/schemas/2.5.0/schema.json",
"$schema": "https://biomejs.dev/schemas/2.5.1/schema.json",
"vcs": {
"enabled": true,
"clientKind": "git",
Expand Down
Loading