Summary
The Replicate Python SDK (replicate) is the official Python client for Replicate, a platform for running AI models (LLMs, image generation, video, audio, and more) in the cloud. Its replicate.run() and replicate.predictions.create() APIs are the primary execution surfaces. The latest release is v1.0.7 (May 27, 2026). This repository has zero native instrumentation for any replicate SDK execution surface — no integration directory, no wrapper, no patcher, no auto_instrument() support.
Braintrust documents a gateway/proxy-based integration with Replicate where users route Replicate calls through the Braintrust gateway using an OpenAI-compatible client. This approach does not cover users of the native replicate Python SDK, which has its own distinct API (replicate.run("owner/model", input={...})). The replicate SDK is not an openai.OpenAI subclass and cannot be wrapped with wrap_openai().
The same pattern exists for other providers in this repo: Mistral, OpenRouter, xAI, Groq, and Together AI all have OpenAI-compatible endpoints, yet received dedicated native integrations because users following official provider documentation and pip install <provider> get zero Braintrust tracing from the OpenAI wrapper alone.
What needs to be instrumented
The replicate package (v1.0.7) exposes these execution surfaces, none of which are instrumented:
Model execution (highest priority)
| API |
Description |
Return type |
replicate.run(model, input, ...) |
Synchronously run a model and return the complete output |
Any (model-specific: str, list, FileOutput, ...) |
replicate.stream(model, input, ...) |
Stream model output as server-sent events — primarily for LLM token streaming |
Iterator[ServerSentEvent] |
replicate.run() accepts any Replicate model identifier (e.g., "meta/meta-llama-3-8b-instruct", "stability-ai/sdxl") and an input dict with model-specific parameters. The model identifier encodes the provider and model name — useful for span metadata.
LLM streaming: For language models, replicate.run() returns an iterator yielding text tokens. replicate.stream() provides SSE-based streaming for the same use case.
Async execution
| API |
Description |
Return type |
await replicate.async_run(model, input, ...) |
Async equivalent of replicate.run() |
Any |
Background predictions
| API |
Description |
Return type |
replicate.predictions.create(model, input, ...) |
Create a prediction without waiting for it to complete |
Prediction |
await replicate.predictions.async_create(model, input, ...) |
Async prediction creation |
Prediction |
prediction.wait() |
Wait for a prediction to complete |
Prediction |
Prediction objects expose .status, .output, .logs, .metrics (includes predict_time), .urls (cancel, get, stream webhooks).
Implementation notes
Generic execution model: Unlike OpenAI or Anthropic where model names map to a known schema, Replicate model inputs and outputs are model-specific. The integration should log the model identifier (owner/model or owner/model:version), the input dict, and the output value. Token count extraction is only possible for models that return usage metadata in their response or logs.
Model identifier: Replicate model identifiers like "meta/meta-llama-3-8b-instruct" or "stability-ai/stable-diffusion-3" encode the model type. The span should capture model (full identifier) and provider: "replicate" in metadata.
Output types: LLM models return iterators of text tokens; image models return FileOutput objects; other models may return strings, numbers, lists, or dicts. The integration should handle all cases, collecting token text for LLM outputs and logging output metadata (type, size if file) for others.
Streaming LLM output: replicate.run() for LLM models returns a generator. The span should be created before iteration starts and finalized (with full output text) when iteration completes. replicate.stream() follows the same pattern with SSE events.
predict_time metric: Prediction.metrics["predict_time"] provides server-side inference time in seconds — a useful span metric.
Client pattern: The module-level functions (replicate.run()) delegate to a default Client instance. Patching Client.run, Client.stream, and Client.async_run covers both usage patterns.
Authentication: Uses REPLICATE_API_TOKEN env var or explicit api_token kwarg. VCR cassettes need Authorization: Token <key> header sanitization.
Proposed span shape
| Span field |
Content |
| input |
model identifier, input dict |
| output |
collected text output (LLM), null or file metadata (image/video) |
| metadata |
provider: "replicate", model (full identifier), version if specified |
| metrics |
predict_time from Prediction.metrics, time_to_first_token (streaming LLMs) |
No coverage in any instrumentation layer
- No integration directory (
py/src/braintrust/integrations/replicate/)
- No wrapper function (e.g.
wrap_replicate())
- No patcher in any existing integration
- No nox test session (
test_replicate)
- No version entry in
py/src/braintrust/integrations/versioning.py
- No mention in
py/src/braintrust/integrations/__init__.py
A grep for replicate across py/src/braintrust/ returns zero matches.
Braintrust docs status
unclear — The Braintrust Replicate integration page documents a gateway/proxy approach using an OpenAI client pointed at the Braintrust gateway. This covers users who access Replicate through the OpenAI-compatible gateway endpoint but does not cover native replicate SDK users who call replicate.run() directly. There is no auto_instrument() reference and no wrap_replicate() function documented.
Upstream references
Local repo files inspected
py/src/braintrust/integrations/ — no replicate/ directory exists on main
py/src/braintrust/wrappers/ — no Replicate wrapper
py/noxfile.py — no test_replicate session
py/src/braintrust/integrations/__init__.py — Replicate not listed in integration registry
py/src/braintrust/integrations/versioning.py — no Replicate version matrix
py/pyproject.toml [tool.braintrust.matrix] — no Replicate entry
py/src/braintrust/auto.py — Replicate not listed in auto_instrument() parameters
- Full repo grep for
replicate across py/src/braintrust/ — zero matches
Summary
The Replicate Python SDK (
replicate) is the official Python client for Replicate, a platform for running AI models (LLMs, image generation, video, audio, and more) in the cloud. Itsreplicate.run()andreplicate.predictions.create()APIs are the primary execution surfaces. The latest release is v1.0.7 (May 27, 2026). This repository has zero native instrumentation for anyreplicateSDK execution surface — no integration directory, no wrapper, no patcher, noauto_instrument()support.Braintrust documents a gateway/proxy-based integration with Replicate where users route Replicate calls through the Braintrust gateway using an OpenAI-compatible client. This approach does not cover users of the native
replicatePython SDK, which has its own distinct API (replicate.run("owner/model", input={...})). ThereplicateSDK is not anopenai.OpenAIsubclass and cannot be wrapped withwrap_openai().The same pattern exists for other providers in this repo: Mistral, OpenRouter, xAI, Groq, and Together AI all have OpenAI-compatible endpoints, yet received dedicated native integrations because users following official provider documentation and
pip install <provider>get zero Braintrust tracing from the OpenAI wrapper alone.What needs to be instrumented
The
replicatepackage (v1.0.7) exposes these execution surfaces, none of which are instrumented:Model execution (highest priority)
replicate.run(model, input, ...)Any(model-specific: str, list, FileOutput, ...)replicate.stream(model, input, ...)Iterator[ServerSentEvent]replicate.run()accepts any Replicate model identifier (e.g.,"meta/meta-llama-3-8b-instruct","stability-ai/sdxl") and aninputdict with model-specific parameters. The model identifier encodes the provider and model name — useful for span metadata.LLM streaming: For language models,
replicate.run()returns an iterator yielding text tokens.replicate.stream()provides SSE-based streaming for the same use case.Async execution
await replicate.async_run(model, input, ...)replicate.run()AnyBackground predictions
replicate.predictions.create(model, input, ...)Predictionawait replicate.predictions.async_create(model, input, ...)Predictionprediction.wait()PredictionPredictionobjects expose.status,.output,.logs,.metrics(includespredict_time),.urls(cancel, get, stream webhooks).Implementation notes
Generic execution model: Unlike OpenAI or Anthropic where model names map to a known schema, Replicate model inputs and outputs are model-specific. The integration should log the model identifier (
owner/modelorowner/model:version), the input dict, and the output value. Token count extraction is only possible for models that return usage metadata in their response or logs.Model identifier: Replicate model identifiers like
"meta/meta-llama-3-8b-instruct"or"stability-ai/stable-diffusion-3"encode the model type. The span should capturemodel(full identifier) andprovider: "replicate"in metadata.Output types: LLM models return iterators of text tokens; image models return
FileOutputobjects; other models may return strings, numbers, lists, or dicts. The integration should handle all cases, collecting token text for LLM outputs and logging output metadata (type, size if file) for others.Streaming LLM output:
replicate.run()for LLM models returns a generator. The span should be created before iteration starts and finalized (with full output text) when iteration completes.replicate.stream()follows the same pattern with SSE events.predict_timemetric:Prediction.metrics["predict_time"]provides server-side inference time in seconds — a useful span metric.Client pattern: The module-level functions (
replicate.run()) delegate to a defaultClientinstance. PatchingClient.run,Client.stream, andClient.async_runcovers both usage patterns.Authentication: Uses
REPLICATE_API_TOKENenv var or explicitapi_tokenkwarg. VCR cassettes needAuthorization: Token <key>header sanitization.Proposed span shape
modelidentifier,inputdictnullor file metadata (image/video)provider: "replicate",model(full identifier),versionif specifiedpredict_timefromPrediction.metrics,time_to_first_token(streaming LLMs)No coverage in any instrumentation layer
py/src/braintrust/integrations/replicate/)wrap_replicate())test_replicate)py/src/braintrust/integrations/versioning.pypy/src/braintrust/integrations/__init__.pyA grep for
replicateacrosspy/src/braintrust/returns zero matches.Braintrust docs status
unclear— The Braintrust Replicate integration page documents a gateway/proxy approach using an OpenAI client pointed at the Braintrust gateway. This covers users who access Replicate through the OpenAI-compatible gateway endpoint but does not cover nativereplicateSDK users who callreplicate.run()directly. There is noauto_instrument()reference and nowrap_replicate()function documented.Upstream references
Local repo files inspected
py/src/braintrust/integrations/— noreplicate/directory exists onmainpy/src/braintrust/wrappers/— no Replicate wrapperpy/noxfile.py— notest_replicatesessionpy/src/braintrust/integrations/__init__.py— Replicate not listed in integration registrypy/src/braintrust/integrations/versioning.py— no Replicate version matrixpy/pyproject.toml[tool.braintrust.matrix]— no Replicate entrypy/src/braintrust/auto.py— Replicate not listed inauto_instrument()parametersreplicateacrosspy/src/braintrust/— zero matches