feat: FetchURL tool supports downloading images#946
Conversation
- Added 'image' kind to UrlFetchKind and UrlFetchResult - LocalFetchURLProvider detects image Content-Type and converts to base64 - Tool execution reports image-specific message to the LLM - Added tests for PNG, JPEG, SVG, and oversized image rejection - Updated tool description to mention image support
|
commit: |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c88be09fe3
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| const markdown = ``; | ||
| return { content: markdown, kind: 'image' }; |
There was a problem hiding this comment.
Return fetched images as media parts
When this image path runs, the result stays a plain string, so createToolMessage wraps it as a text part rather than an image_url part; FetchURLTool also sends that same string through the 50k-character ToolResultBuilder. In normal model calls the tool result is therefore base64 text, or an invalid truncated data URI for images larger than about 37 KiB, not an attached image the model can inspect. Please return a ContentPart[] image output like ReadMedia does instead of markdown text.
Useful? React with 👍 / 👎.
Summary
This PR extends the FetchURL tool to support downloading images from URLs and returning them as base64-encoded markdown. This enables the LLM to view images referenced by URL without requiring the user to manually download them.
Changes
Core Changes
Safety
Tests
Motivation
Previously, when a user provided an image URL, the FetchURL tool would attempt to extract text from it (via Readability) and return empty or garbled content. Now the LLM can actually view the image by receiving it as inline base64 markdown.
Backward Compatibility
Fully backward compatible. Non-image URLs continue to work exactly as before. The new kind is only returned when the server responds with Content-Type.