Requesting changes - architectural concern + a few correctness issues
Main blocker: provider branching duplicates infrastructure we already have
This change hardcodes if provider == "openai" / elif "gemini" inside the action, with two SDK imports, two client inits, and two error-mapping blocks. We already have provider abstraction elsewhere in the repo:
MODEL_REGISTRY + InterfaceType enum (used by VLM/LLM)
LLMInterface / VLMInterface wrappers in app/llm_interface.py and app/vlm_interface.py that hide the provider-specific SDK calls
describe_image.py (lines 62–70) is the reference pattern: read the configured provider from MODEL_REGISTRY[provider][InterfaceType.VLM], then delegate
As-is this PR builds a third parallel provider system that future image providers (Stability, Replicate, xAI, OpenRouter image, etc.) will all have to extend by adding another elif branch here. It also introduces a new image_generation.preferred_provider setting that parallels the existing vlm_provider / llm_provider pattern instead of joining it.
Could we route this through MODEL_REGISTRY with a new InterfaceType.IMAGE_GEN and an ImageGenInterface in agent_core, mirroring how VLMInterface is set up, so generate_image.py ends up looking like describe_image.py? Reusing InterfaceType.VLM directly is tempting since some providers serve both through one endpoint, but the capability sets differ (Claude / ByteDance support VLM but not gen) and users will want to pick providers independently for each.
Other issues worth fixing while you're in here
- OpenAI aspect-ratio map is wrong.
"16:9": "1536x1024" is 3:2, "9:16": "1024x1536" is 2:3. The canvas constraint is real (gpt-image only has 3 sizes), but silently mismapping → at least append to warnings. (I skimmed real quick so please verify)
- Silent 4K downgrade for OpenAI.
"4K": "high" returns at most 1536×1024. Either reject 4K for OpenAI or warn.
quality dropped on the edit path. images.generate(..., quality=...) is passed, but images.edit(...) isn't - reference-image runs silently render at lower quality.
images.edit ≠ "style reference." The existing reference_images field is documented as style guidance (how Gemini uses them). OpenAI's images.edit treats inputs as compositional/mask inputs. Same input, very different output between providers.
- Provider-selection UX doesn't match the PR description. The description says "asks the user" when both keys are present, but the code silently defaults to Gemini - there's no signal in the response telling the calling LLM that a choice is available. Once
provider_preference is saved, there's also no way to clear it.
Requesting changes - architectural concern + a few correctness issues
Main blocker: provider branching duplicates infrastructure we already have
This change hardcodes
if provider == "openai" / elif "gemini"inside the action, with two SDK imports, two client inits, and two error-mapping blocks. We already have provider abstraction elsewhere in the repo:MODEL_REGISTRY+InterfaceTypeenum (used by VLM/LLM)LLMInterface/VLMInterfacewrappers inapp/llm_interface.pyandapp/vlm_interface.pythat hide the provider-specific SDK callsdescribe_image.py(lines 62–70) is the reference pattern: read the configured provider fromMODEL_REGISTRY[provider][InterfaceType.VLM], then delegateAs-is this PR builds a third parallel provider system that future image providers (Stability, Replicate, xAI, OpenRouter image, etc.) will all have to extend by adding another
elifbranch here. It also introduces a newimage_generation.preferred_providersetting that parallels the existingvlm_provider/llm_providerpattern instead of joining it.Could we route this through
MODEL_REGISTRYwith a newInterfaceType.IMAGE_GENand anImageGenInterfaceinagent_core, mirroring howVLMInterfaceis set up, sogenerate_image.pyends up looking likedescribe_image.py? ReusingInterfaceType.VLMdirectly is tempting since some providers serve both through one endpoint, but the capability sets differ (Claude / ByteDance support VLM but not gen) and users will want to pick providers independently for each.Other issues worth fixing while you're in here
"16:9": "1536x1024"is 3:2,"9:16": "1024x1536"is 2:3. The canvas constraint is real (gpt-image only has 3 sizes), but silently mismapping → at least append towarnings. (I skimmed real quick so please verify)"4K": "high"returns at most 1536×1024. Either reject 4K for OpenAI or warn.qualitydropped on the edit path.images.generate(..., quality=...)is passed, butimages.edit(...)isn't - reference-image runs silently render at lower quality.images.edit≠ "style reference." The existingreference_imagesfield is documented as style guidance (how Gemini uses them). OpenAI'simages.edittreats inputs as compositional/mask inputs. Same input, very different output between providers.provider_preferenceis saved, there's also no way to clear it.