Skip to content

feat(routing): wire Cerebras + drop byok-paid + deprecate localFirstAI#53

Merged
Pterjudin merged 3 commits into
mainfrom
feat/free-tier-router-followup-2026-05-25
May 25, 2026
Merged

feat(routing): wire Cerebras + drop byok-paid + deprecate localFirstAI#53
Pterjudin merged 3 commits into
mainfrom
feat/free-tier-router-followup-2026-05-25

Conversation

@Pterjudin
Copy link
Copy Markdown

Summary

Follow-up to the merged free-tier router (PR #51). Addresses three of the five items punted in that PR's body.

Changes

  • `refactor(routing): drop unused byok-paid routing policy` — `byok-paid` was declared in the `RoutingPolicy` union but had no implementation behind it. `auto-cheapest` already serves the "user has paid keys, route by capability" use case. Removed from the union, the Settings.tsx select, and any switch arms. Persisted setting silently coerced to `auto-cheapest` on load.
  • `refactor(routing): deprecate localFirstAI in favour of routingPolicy` — `localFirstAI` and `routingPolicy: 'local-only'` were two settings for the same intent. `localFirstAI` is now marked `@deprecated`; settings load logic migrates `true` → `routingPolicy = 'local-only'` once, then routing logic reads only `routingPolicy`. `localFirstAI` field kept in the settings type for backward compatibility with stored configs; can be removed in a future release.
  • `feat(providers): wire Cerebras as a first-class provider` — Cerebras is now in the provider registry with its OpenAI-compatible endpoint, default model list (Llama 4 Scout, Qwen 3 32B, DeepSeek R1 Distill), and capability flags. `freeTierConstants.ts` Cerebras entry now resolves to `cortexProviderName: 'cerebras'`, so the free-tier ladder picks it up.

Still punted (separate work)

  • Real token counts from provider responses — currently the free-tier router uses `output.length / 4` as a proxy. Real implementation requires plumbing `usage` from each provider's response shape through `sendLLMMessage.impl.ts`.
  • Status-bar widget styling — current widget is functional but unpolished; icon-only collapsed state + warning-colour at >80% quota consumption is a follow-up.

Verification

  • `npm run compile-check-ts-native` — clean
  • `npm run valid-layers-check` — no new violations
  • Existing 6 `freeTierLadder` tests still pass

Tajudeen and others added 3 commits May 25, 2026 05:06
The 'byok-paid' option in the routing policy union was never wired to a
distinct selection path; it fell through to the same scoring code as
'auto-cheapest'. Removing it tightens the type, simplifies the Settings
UI, and avoids advertising a feature that does not exist.

Users with the value persisted (set via the Settings select before this
change) are silently coerced to 'auto-cheapest' on load - the score-based
selector already handles the "prefer paid BYOK keys" use case via its
existing capability tier weights. If a use case for an explicit paid-only
filter surfaces, it can be reintroduced behind a clearer name.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`localFirstAI` and `routingPolicy === 'local-only'` describe the same
intent ("never route to the cloud"). Maintaining two switches is a
source of drift; this collapses them onto `routingPolicy` and keeps
`localFirstAI` as a read-only backward-compat input.

Migration runs once on settings load: when `routingPolicy` is unset
(typical for installs predating this work), `localFirstAI: true` is
translated to `routingPolicy: 'local-only'` and `localFirstAI: false`
to `'auto-cheapest'`. An explicitly-set `routingPolicy` always wins.

Routing call sites that previously read `localFirstAI` now read both
signals (`routingPolicy === 'local-only' || localFirstAI`) so installs
that haven't migrated yet keep working. The `localFirstAI` field is
marked `@deprecated` for removal in a future release.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cerebras Cloud is the highest-quality free-tier provider on the router's
quality ladder (rank 100, ahead of Groq at 80) but the existing entry in
freeTierConstants.ts was a dead slot: cortexProviderName was null because
no underlying provider plumbing existed, so the router could never pick
it. This adds the missing wiring end to end:

  - new 'cerebras' ProviderName with apiKey setting and OpenAI-compatible
    endpoint (https://api.cerebras.ai/v1)
  - four default models matching the Cerebras free-tier catalogue
    (llama-4-scout, qwen-3-32b, deepseek-r1-distill-llama-70b, llama-3.3-70b)
    with the 8K context cap baked into model options
  - cerebras entry in modelSettingsOfProvider, settings display info,
    apiKey placeholder, and subText link to https://cloud.cerebras.ai
  - dispatch table entry routing chat through the OpenAI-compatible path
  - freeTierConstants.ts: set cortexProviderName: 'cerebras' so the
    ladder will rank a configured Cerebras key ahead of Groq/Gemini

Tests: extend freeTierLadder.test.ts with two Cerebras-specific cases
(top-of-ladder when both have quota; failover to Groq when exhausted).

Reasoning models (qwen-3, deepseek-r1-distill) declare reasoning
capabilities with <think> tag parsing; other models declare
reasoningCapabilities: false. Tool calling uses the standard openai-style
format documented in Cerebras's API reference.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Pterjudin Pterjudin marked this pull request as ready for review May 25, 2026 04:32
@Pterjudin Pterjudin merged commit 7190206 into main May 25, 2026
12 of 23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant