[V1.3.3] Image/Video generation interfaces

## Requesting changes - architectural concern + a few correctness issues

### Main blocker: provider branching duplicates infrastructure we already have

This change hardcodes `if provider == "openai" / elif "gemini"` inside the action, with two SDK imports, two client inits, and two error-mapping blocks. We already have provider abstraction elsewhere in the repo:

- `MODEL_REGISTRY` + `InterfaceType` enum (used by VLM/LLM)
- `LLMInterface` / `VLMInterface` wrappers in `app/llm_interface.py` and `app/vlm_interface.py` that hide the provider-specific SDK calls
- `describe_image.py` (lines 62–70) is the reference pattern: read the configured provider from `MODEL_REGISTRY[provider][InterfaceType.VLM]`, then delegate

As-is this PR builds a third parallel provider system that future image providers (Stability, Replicate, xAI, OpenRouter image, etc.) will all have to extend by adding another `elif` branch here. It also introduces a new `image_generation.preferred_provider` setting that parallels the existing `vlm_provider` / `llm_provider` pattern instead of joining it.

Could we route this through `MODEL_REGISTRY` with a new `InterfaceType.IMAGE_GEN` and an `ImageGenInterface` in `agent_core`, mirroring how `VLMInterface` is set up, so `generate_image.py` ends up looking like `describe_image.py`? Reusing `InterfaceType.VLM` directly is tempting since some providers serve both through one endpoint, but the capability sets differ (Claude / ByteDance support VLM but not gen) and users will want to pick providers independently for each.

### Other issues worth fixing while you're in here

- **OpenAI aspect-ratio map is wrong.** `"16:9": "1536x1024"` is 3:2, `"9:16": "1024x1536"` is 2:3. The canvas constraint is real (gpt-image only has 3 sizes), but silently mismapping → at least append to `warnings`. (I skimmed real quick so please verify)
- **Silent 4K downgrade for OpenAI.** `"4K": "high"` returns at most 1536×1024. Either reject 4K for OpenAI or warn.
- **`quality` dropped on the edit path.** `images.generate(..., quality=...)` is passed, but `images.edit(...)` isn't - reference-image runs silently render at lower quality.
- **`images.edit` ≠ "style reference."** The existing `reference_images` field is documented as style guidance (how Gemini uses them). OpenAI's `images.edit` treats inputs as compositional/mask inputs. Same input, very different output between providers.
- **Provider-selection UX doesn't match the PR description.** The description says "asks the user" when both keys are present, but the code silently defaults to Gemini - there's no signal in the response telling the calling LLM that a choice is available. Once `provider_preference` is saved, there's also no way to clear it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[V1.3.3] Image/Video generation interfaces #294

Requesting changes - architectural concern + a few correctness issues

Main blocker: provider branching duplicates infrastructure we already have

Other issues worth fixing while you're in here

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[V1.3.3] Image/Video generation interfaces #294

Description

Requesting changes - architectural concern + a few correctness issues

Main blocker: provider branching duplicates infrastructure we already have

Other issues worth fixing while you're in here

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions