Skip to content

fix: infer LLM provider from present key; stop mislabeling unhealthy backend as stopped containers#19

Open
akozak-gd wants to merge 1 commit into
mainfrom
fix/provider-inference-and-backend-ready-message
Open

fix: infer LLM provider from present key; stop mislabeling unhealthy backend as stopped containers#19
akozak-gd wants to merge 1 commit into
mainfrom
fix/provider-inference-and-backend-ready-message

Conversation

@akozak-gd

@akozak-gd akozak-gd commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

Problem

An Anthropic-only .env (ANTHROPIC_API_KEY set, no provider chosen) silently defaulted to the OpenRouter provider:

  1. DEFAULT_PROVIDER defaulted to openrouter (config.py) with no derivation from which key is present.
  2. Startup validation requires the active provider's key → OPENROUTER_API_KEY reported missing → StartupValidationError.
  3. app_lifecycle deliberately doesn't crash on that error — it marks healthy=False, so /health/ready returns 503 forever while the containers stay up.
  4. The TUI startup gate saw containers running but backend not ready, and reused the "The SpecFlow containers aren't running." screen — a misleading message, since the containers were fine.

Provider inference already existed in specflow-init.sh (it wrote DEFAULT_PROVIDER=anthropic to .env for the Anthropic-only case), but that only runs at init time — a key set manually or after init bypasses it, and it duplicates the decision the backend also makes.

Fix

Make the provider derived from the key present, in exactly one place (the backend), and remove the DEFAULT_PROVIDER knob so there's nothing to drift.

  • configDEFAULT_PROVIDER is resolved from whichever key is set: OpenRouter when its key is present (also the documented default when both are set), Anthropic when only that key is set, else OpenRouter (which then fails startup fast). It is no longer a configurable knob, so a provider/key mismatch is impossible by construction.
  • startup_validation — with no provider key set at all, the check fails fast naming both keys.
  • init script / docker-compose / .env.quickstart.example / QUICKSTART — drop DEFAULT_PROVIDER entirely (including the now-redundant inference block in specflow-init.sh) so provider resolution lives only in the backend.
  • TUI — distinguish "containers down" from "containers up but backend unhealthy." The latter shows an accurate prompt and re-polls readiness without re-running docker compose up.

Design note / trade-off

You can no longer force Anthropic when both keys are set (both → OpenRouter). This matches the documented "set ONE of the two keys" UX; the escape hatch is to set only the key you want. If a hard override is ever needed, it should be reintroduced as a single explicit knob rather than alongside the key-based derivation.

Not included

--build on docker compose up (raised in review) is intentionally left out — it's a separate, larger change (also needs repo change-detection, not just up/down) and is orthogonal to this bug.

Verification

  • make unit-tests — full backend + mcp_server suites pass. Added coverage for provider resolution and both TUI message paths; rewrote tests that were coupled to the removed knob (test_enums.py "env-overridable" → "env value ignored"; startup tests now control the provider via keys).
  • make check-complexity-diff METRIC=cc — average complexity unchanged (3.66).
  • End-to-end: with only ANTHROPIC_API_KEY in .env, curl -sf http://127.0.0.1:8000/health/ready returns 200 and the TUI proceeds past the gate.

Comment thread mcp_server/tui/app.py

Two modes, selected by ``containers_up``:
* ``False`` (default) — the containers aren't running. ``y`` runs
``docker compose up -d`` then waits for the backend to report ready.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could add here flag: --build so on default we always build the image so users do not need to do themselves when they pull our repo. On the other hand we do not verify if there was change to repo just if docker is down or has no containers. So it might actually be part of bigger change for future.

Comment thread backend/app/core/config.py Outdated
# instead of silently defaulting to OpenRouter (which then fails the startup
# key check and leaves /health/ready 503 forever). An explicit DEFAULT_PROVIDER
# is always honored; both keys present keeps the documented OpenRouter default.
if "DEFAULT_PROVIDER" in self.model_fields_set:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually why we need this Flag? I think it is misleading. The docs say use one Key or another. So we should verify if one of them is set, if neither Fail Fast error. If one of them choose the one set. IF both, use open router as default. I would generally remove it starting from .env.example and docker-compose as for me it has no real value. @awrobel-gd ?

@akozak-gd akozak-gd force-pushed the fix/provider-inference-and-backend-ready-message branch 2 times, most recently from e01dd66 to 4df0fb4 Compare July 3, 2026 11:32
@akozak-gd akozak-gd requested a review from mkonopelski-gd July 3, 2026 11:44
Comment thread specflow-init.sh Outdated
log "INFO: DEFAULT_PROVIDER resolved to anthropic from available LLM key"
fi
fi
# The active LLM provider is not persisted here: the backend derives it from

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do not think that if we remove the entire Step this commentary is necessary.

Comment thread backend/app/core/config.py Outdated
provider = LLMProvider.ANTHROPIC
else:
provider = LLMProvider.OPENROUTER
object.__setattr__(self, "DEFAULT_PROVIDER", provider)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akozak-gd this is recommended way to do it this kind of setting Attribute based on other attributes:

from functools import cached_property
from pydantic import BaseSettings, computed_field

class Settings(BaseSettings):
    ANTHROPIC_API_KEY: Optional[str] = None
    OPENROUTER_API_KEY: Optional[str] = None

    @computed_field
    @cached_property
    def DEFAULT_PROVIDER(self) -> LLMProvider:
        if self.OPENROUTER_API_KEY:
            return LLMProvider.OPENROUTER
        if self.ANTHROPIC_API_KEY:
            return LLMProvider.ANTHROPIC
        return LLMProvider.OPENROUTER

@akozak-gd akozak-gd force-pushed the fix/provider-inference-and-backend-ready-message branch from 4df0fb4 to fc6ef0d Compare July 3, 2026 11:56
@akozak-gd akozak-gd requested a review from mkonopelski-gd July 3, 2026 12:05
Comment thread backend/app/core/config.py Outdated
Comment on lines +373 to +376
if self.OPENROUTER_API_KEY:
return LLMProvider.OPENROUTER
if self.ANTHROPIC_API_KEY:
return LLMProvider.ANTHROPIC

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unless explicitly validated with Pydantic (dont remember if we have it) - dont do if .. as this is the same as if bool(...)

Lets try:

def is_key_valid(raw_key):
    if str(raw_key).strip():
        return raw_key
     else:
        return false

@akozak-gd akozak-gd force-pushed the fix/provider-inference-and-backend-ready-message branch from fc6ef0d to 056282e Compare July 3, 2026 12:34
@akozak-gd akozak-gd requested a review from awrobel-gd July 3, 2026 12:38
…lthy backend as stopped containers

An Anthropic-only .env (ANTHROPIC_API_KEY set, no provider) defaulted to the
OpenRouter provider, failed the startup key check, and left /health/ready at
503 forever. The TUI then reused the 'containers aren't running' screen even
though the containers were up.

- config: DEFAULT_PROVIDER is now derived solely from which key is present
  (openrouter if its key is set — also the default when both are; anthropic when
  only that key is set; neither -> openrouter, which fails startup fast). It is
  no longer a configurable knob, so a provider/key mismatch is impossible.
- startup_validation: with no provider key set, fail fast naming both keys.
- init script / docker-compose / .env.quickstart.example: drop DEFAULT_PROVIDER
  so provider resolution lives in exactly one place (the backend), no drift.
- TUI: distinguish 'containers down' from 'containers up but backend unhealthy';
  the latter re-polls readiness without re-running compose up.
@akozak-gd akozak-gd force-pushed the fix/provider-inference-and-backend-ready-message branch from 056282e to 41c4fd7 Compare July 3, 2026 12:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants