v2.1: token expiry, browser/CORS support, per-token rate limiting#2
Open
Bug-Finderr wants to merge 22 commits into
Open
v2.1: token expiry, browser/CORS support, per-token rate limiting#2Bug-Finderr wants to merge 22 commits into
Bug-Finderr wants to merge 22 commits into
Conversation
…p v2.1 - doppelganger -> "proxy token" across docs/README/tests/comments; token prefix dgk_ -> ptk_ (cosmetic only; tokens validate by hash, existing ones unaffected) - move the three v1 per-provider passthrough workers + wrangler configs to _legacy/v1/ (git renames) with a README; reference-only, not deployed - document Gemini as built-but-unproven (no key yet; mock-tested only) - align lefthook pre-commit biome flag with the lint script (--unsafe)
- optional expiresAt (UTC ISO) on TokenMetadata + CreateInput - getValidatedByHash rejects past/malformed expiry; fail-closed on NaN - admin: "Expires (optional)" datetime field + "Expires" column; expired tokens show "expired" and dim the row - not KV expirationTtl (60s floor, deletes record, orphans the :lu key) - tests: absent / future / past / malformed expiry
- handleProxy answers the OPTIONS preflight (204) before auth checks, reflecting Origin + the requested headers; previously every browser preflight 401'd - reflect Origin on every response and expose the Gemini resumable-upload headers (x-goog-upload-url etc.) so browser clients can read them - Gemini upload URL still passes through verbatim (bytes go client->Google direct; the real key never rides that leg). Browser callers set the SDK's own opt-in. - tests: preflight, reflected-Origin + expose-headers, no-Origin no-op
- after token validation, limit() keyed on the SHA-256 hash; 429 + Retry-After: 60 on deny. Fail-open on a missing/erroring binding so it never bricks the proxy. - wrangler [[ratelimits]] RATE_LIMITER at 100 req / 60s (one shared ceiling, KISS; tune freely). Verified live on the Free plan: the binding deploys and limit() enforces. - per-colo + eventually-consistent: a loose ceiling that stops sustained abuse, not a strict gate (documented). - tests: deny -> 429, allow -> forward, throw -> fail-open
…fresh, docs - move schedule.sh -> _legacy/ (archived helper; paths resolve from repo root, provider flags point at the archived _legacy/v1/ workers) - dashboard: poll the token list every 10s so new tokens / lastUsed surface despite KV list() eventual consistency (~60s) - README: document per-token controls (expiry, rate limit) + browser/CORS support, mark Gemini "untested with the actual API", point disable/enable at _legacy/
…rom README - docs/architecture.md: the full current design (topology, request flow, routing, auth swap, token model, rate limiting, CORS, OpenAI egress DO, admin, testing, security) - replaces the superpowers design spec - learnings: rate-limit binding (free on Free plan, loose per-colo) and token expiry (check-at-validate, fail-closed) - README: link to docs/architecture.md; drop the legacy schedule.sh section
- README: lead with "Use it" — the libraries it works with (official OpenAI/Anthropic/ GenAI SDKs + any standard-auth client) and how a client points at the worker; trim "How it works" to a brief mechanism + link to architecture.md - architecture.md: replace ASCII pseudocode/box diagrams with numbered lists + a simple dispatch snippet; drop the repo-layout section (it just rots)
Restore the request-flow/topology formatting from 5df93ab (a prior full-file rewrite had overwritten it) and remove the repo-layout section, which only rots.
…after the client - fetch.ts: raw HTTP - covers the Gemini ?key= auth slot (no SDK exercises it), verbatim request-body forwarding, and an end-to-end CORS preflight - litellm.py: separate Python runner (local .venv) driving LiteLLM through the worker; run via `nub run test:py`, also chained into `nub run test` - rename openai/anthropic/gemini/fetch .test.ts -> .ts (file = the client it drives); compat config globs test/sdk-compat/*.ts and excludes the setup.ts harness - README documents each test as a per-client usage example + the venv setup
…learning - architecture.md: note auth-slot precedence over ?key=, the CORS method allow-list + Vary: Origin, Path=/admin on the admin cookie; drop the unsourced "~60%" figure; trim the §15 recap; fix the ratelimits TOML spacing - learnings: new cors-preflight-and-upload-passthrough.md (preflight before auth; the Gemini upload URL is passed through, not rewritten) + note the 8-way egress DO pool
…irements to test/ - gemini.ts -> google-genai.ts (@google/genai), anthropic.ts -> anthropic-ai-sdk.ts (@anthropic-ai/sdk); openai.ts already matches its package. Avoids collision with the future Vercel AI SDK (@ai-sdk/*) per-provider tests (see HANDOFF). - move requirements.txt to test/ (beside the run-py.mjs runner); update the path in run-py.mjs, litellm.py, requirements.txt, and the README
An 8-agent source-level survey (official SDKs in Python/Node/Go/Java/Ruby/.NET, Vercel AI SDK incl. @ai-sdk/google, LangChain JS+Py, LiteLLM, LlamaIndex, instructor, Aider/Cline/Continue/Open WebUI) confirms every client collapses onto one of the 4 auth slots already tested; none hits a new slot or path. The decisive case @ai-sdk/google uses the x-goog-api-key header at source (not ?key=, not Bearer), mapping to the existing gemini slot. So per-SDK / per-language compat tests would be redundant by the proxy routing logic - none added. Instead: new learnings doc with the proof matrix + caveats (Anthropic OAuth Bearer mode, legacy google-generativeai gRPC default, OpenAI /v1/responses verbatim forward), and README + architecture.md tightened to the auth-slot-not-SDK claim.
- Drop fabricated `hitsNewProxyPath` symbol reference (it was a research schema field, not a codebase symbol; a reader would grep and find nothing). - Resolve the "four slots" conflation: the table lists the four provider routes SDKs use (3 distinct header slots + the /v1beta/openai/ path split), and the `?key=` query slot (no SDK uses it, only raw HTTP) is now called out explicitly as the fourth slot the proxy reads.
State plainly that each provider has one real-SDK anchor test (openai.ts/ litellm.py, anthropic-ai-sdk.ts, google-genai.ts/fetch.ts) and that the by-construction claim is only valid because it extends those verified anchors - without an anchor it proves nothing.
…workerd) Per the clarified rule - dedup across LANGUAGES of a package, not across packages - each distinct client library now gets one end-to-end test: - Node: Vercel AI SDK (@ai-sdk/openai|anthropic|google), LangChain (@langchain/openai|anthropic|google-genai), Genkit - Python: LlamaIndex (openai+anthropic+gemini), instructor, Pydantic AI Each drives the real library at the worker and asserts the mock saw the real key swapped into the right slot with the proxy token absent. Mastra is EXCLUDED: nub flagged @mastra/core 1.x as malicious (advisory MAL-2026-6011, embedded malicious code); not bypassed. Other-language packages of a tested SDK, end-user apps (Aider/Cline/Continue/OpenWebUI), and JVM/.NET frameworks stay documented as compatible-by-construction. Fix test:py hang + workerd leak: litellm.py spawned `npx wrangler dev` with shell=True, so terminate() orphaned workerd on Windows. Rebuilt test/run-py.mjs to own ONE worker (unstable_dev, clean teardown) + ONE mock with /__captured + /__reset endpoints; Python files are thin clients reading PROXY_* env. Async spawn (spawnSync froze the mock event loop) + a hard per-file timeout so nothing hangs. Docs: README, architecture.md §13, and the compat learning updated to the test-each-library / dedup-across-languages framing.
- Pin test/requirements.txt with ~= (lock minor): these libs shift default endpoints across minors, which the compat tests encode. Drop unused llama-index-llms-openai-like. - run-py.mjs: match setup.ts providerFromPath (/v1beta/ fallback) so the two mocks do not drift; strip real provider keys (OPENAI/ANTHROPIC/GEMINI/GOOGLE) from the child env so the seeded proxy token is the only key in play.
uv venv + uv pip install -r is the fast drop-in for python -m venv + pip: uv pip install auto-targets the .venv, no activation needed. The runner's .venv/Scripts/python path is unchanged (uv creates a standard venv). Verified: recreated .venv with uv (Python 3.14.5), test:py all green. Updated README, requirements.txt header, and the run-py.mjs skip hint.
litellm 1.83.7->1.89.3, openai 2.30.0->2.43.0, pydantic 2.12.5->2.13.4 (others already latest). Recreated .venv with uv, full suite green: 72 unit + 16 compat + 4 python.
A WebSocket upgrade now rides the proxy: src/ws.ts validates the token (hash -> KV -> scope -> rate limit) exactly like the HTTP path, swaps the real key into the slot the provider's wss API expects, opens the upstream socket with fetch(Upgrade: websocket), and pumps frames both ways through a WebSocketPair. - Auth slots: Authorization Bearer (OpenAI /v1/realtime + /v1/responses WebSocket mode), Sec-WebSocket-Protocol openai-insecure-api-key.<token> (browser Realtime - rewritten to a Bearer header upstream, browsers can't set headers), and ?key= (Gemini Live). Anthropic has no wss API. - Reuses the OpenAI geo-403 fallback: a blocked upgrade retries through the NA-pinned egress DO. - Manual pipe (not pass-through) so the negotiated subprotocol echoes back deterministically; binaryType pinned to arraybuffer so binary realtime audio survives future compatibility_date bumps. - Tests: tier-1 test/ws.test.ts (slot extraction, swap, security invariant, rate limit, geo-403, 502, close-code sanitization) and tier-2 test/sdk-compat/websocket.ts (real ws client -> worker -> ws mock upstream round-trip). New dev deps: ws, @types/ws. UNTESTED AGAINST REAL APIS: all WebSocket coverage runs against a mock ws upstream only. No test has connected to a live OpenAI Realtime / Responses or Gemini Live endpoint (no valid provider keys available), and the geo-blocked WS-over-DO hop is exercised with a faked DO only. Treat the wss path as built-but-unproven live. Also rides along (pre-staged housekeeping): dev-dep minor bumps and the llama-index compat test's Claude model id.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
KISS, key-safe, free-tier additions to the token-gated proxy, plus prep/cleanup. The real provider key never rides any new path.
Features
60445bd) - optionalexpiresAt(UTC ISO) on a token; enforced ingetValidatedByHash(rejects past/malformed, fail-closed on NaN). Not KVexpirationTtl(60s floor, deletes the record, orphans the:lukey). Admin dashboard gets an "Expires" field + column; expired tokens render asexpiredand dim.7d05d45) -handleProxyanswersOPTIONSbefore auth (previously every browser preflight 401'd), reflectsOriginon every response, and exposes the Gemini resumable-upload headers. The upload URL still passes through verbatim (bytes go client->Google; key never on that leg).4c928a3) - Workers Rate Limiting binding keyed on the token hash;429+Retry-After: 60on deny, fail-open so a missing/erroring binding never bricks the proxy. 100 req/60s shared ceiling (tunable). It's a per-colo, loose ceiling - abuse protection, not a strict quota.dcfb71c) - a WS upgrade rides the same validate -> scope -> rate-limit -> key-swap pipeline (src/ws.ts), then pipes frames both ways. Covers OpenAI Realtime (/v1/realtime) + Responses WebSocket mode (/v1/responses) and Gemini Live; Anthropic has no wss API. Handles the wider WS auth-slot set:Authorization: Bearer, theopenai-insecure-api-key.<token>subprotocol (browsers can't set headers - rewritten to a Bearer header upstream), and?key=. Reuses the OpenAI geo-403 egress DO for blocked upgrades;binaryTypepinned toarraybufferso binary realtime audio survives futurecompatibility_datebumps.Per-library compat tests
One end-to-end test per distinct library, in one language (dedup across languages, not packages): Vercel AI SDK (x3 providers), LangChain (x3), Genkit, LlamaIndex (x3), instructor, Pydantic AI, LiteLLM + the official SDKs and raw
fetch. Mastra excluded (@mastra/coreflagged by advisory MAL-2026-6011). Proof matrix:docs/learnings/compat-is-the-auth-slot-not-the-sdk.md.Prep / cleanup
dgk_->ptk_(cosmetic; existing tokens validate by hash, unaffected) (b2dad21).schedule.shto_legacy/(b2dad21,bfe4d3a).list()~60s lag) (bfe4d3a).test:pyrunner rebuilt (owns one worker + one mock; async spawn + hard timeout) - fixes the orphaned-workerdhang.Deferred (not in this PR)
Spend/usage caps (needs a metering DO + SSE usage parsing), key pools (YAGNI), concurrency / longer windows.
Testing
wsclient -> worker ->wsmock round-trip) + 4 tier-2 Python.tsc+ biome clean.limit()enforces on the Free plan - the only research claim that was unverified).wsupstream only - no test has connected to a live OpenAI Realtime/Responses or Gemini Live endpoint (no valid provider keys available), and the geo-blocked WS-over-DO hop is exercised with a faked DO only. Treat the wss path as built-but-unproven live.