Skip to content

[bot] Add Replicate Python SDK native integration for run, stream, and predictions execution instrumentation #516

@braintrust-bot

Description

@braintrust-bot

Summary

The Replicate Python SDK (replicate) is the official Python client for Replicate, a platform for running AI models (LLMs, image generation, video, audio, and more) in the cloud. Its replicate.run() and replicate.predictions.create() APIs are the primary execution surfaces. The latest release is v1.0.7 (May 27, 2026). This repository has zero native instrumentation for any replicate SDK execution surface — no integration directory, no wrapper, no patcher, no auto_instrument() support.

Braintrust documents a gateway/proxy-based integration with Replicate where users route Replicate calls through the Braintrust gateway using an OpenAI-compatible client. This approach does not cover users of the native replicate Python SDK, which has its own distinct API (replicate.run("owner/model", input={...})). The replicate SDK is not an openai.OpenAI subclass and cannot be wrapped with wrap_openai().

The same pattern exists for other providers in this repo: Mistral, OpenRouter, xAI, Groq, and Together AI all have OpenAI-compatible endpoints, yet received dedicated native integrations because users following official provider documentation and pip install <provider> get zero Braintrust tracing from the OpenAI wrapper alone.

What needs to be instrumented

The replicate package (v1.0.7) exposes these execution surfaces, none of which are instrumented:

Model execution (highest priority)

API Description Return type
replicate.run(model, input, ...) Synchronously run a model and return the complete output Any (model-specific: str, list, FileOutput, ...)
replicate.stream(model, input, ...) Stream model output as server-sent events — primarily for LLM token streaming Iterator[ServerSentEvent]

replicate.run() accepts any Replicate model identifier (e.g., "meta/meta-llama-3-8b-instruct", "stability-ai/sdxl") and an input dict with model-specific parameters. The model identifier encodes the provider and model name — useful for span metadata.

LLM streaming: For language models, replicate.run() returns an iterator yielding text tokens. replicate.stream() provides SSE-based streaming for the same use case.

Async execution

API Description Return type
await replicate.async_run(model, input, ...) Async equivalent of replicate.run() Any

Background predictions

API Description Return type
replicate.predictions.create(model, input, ...) Create a prediction without waiting for it to complete Prediction
await replicate.predictions.async_create(model, input, ...) Async prediction creation Prediction
prediction.wait() Wait for a prediction to complete Prediction

Prediction objects expose .status, .output, .logs, .metrics (includes predict_time), .urls (cancel, get, stream webhooks).

Implementation notes

Generic execution model: Unlike OpenAI or Anthropic where model names map to a known schema, Replicate model inputs and outputs are model-specific. The integration should log the model identifier (owner/model or owner/model:version), the input dict, and the output value. Token count extraction is only possible for models that return usage metadata in their response or logs.

Model identifier: Replicate model identifiers like "meta/meta-llama-3-8b-instruct" or "stability-ai/stable-diffusion-3" encode the model type. The span should capture model (full identifier) and provider: "replicate" in metadata.

Output types: LLM models return iterators of text tokens; image models return FileOutput objects; other models may return strings, numbers, lists, or dicts. The integration should handle all cases, collecting token text for LLM outputs and logging output metadata (type, size if file) for others.

Streaming LLM output: replicate.run() for LLM models returns a generator. The span should be created before iteration starts and finalized (with full output text) when iteration completes. replicate.stream() follows the same pattern with SSE events.

predict_time metric: Prediction.metrics["predict_time"] provides server-side inference time in seconds — a useful span metric.

Client pattern: The module-level functions (replicate.run()) delegate to a default Client instance. Patching Client.run, Client.stream, and Client.async_run covers both usage patterns.

Authentication: Uses REPLICATE_API_TOKEN env var or explicit api_token kwarg. VCR cassettes need Authorization: Token <key> header sanitization.

Proposed span shape

Span field Content
input model identifier, input dict
output collected text output (LLM), null or file metadata (image/video)
metadata provider: "replicate", model (full identifier), version if specified
metrics predict_time from Prediction.metrics, time_to_first_token (streaming LLMs)

No coverage in any instrumentation layer

  • No integration directory (py/src/braintrust/integrations/replicate/)
  • No wrapper function (e.g. wrap_replicate())
  • No patcher in any existing integration
  • No nox test session (test_replicate)
  • No version entry in py/src/braintrust/integrations/versioning.py
  • No mention in py/src/braintrust/integrations/__init__.py

A grep for replicate across py/src/braintrust/ returns zero matches.

Braintrust docs status

unclear — The Braintrust Replicate integration page documents a gateway/proxy approach using an OpenAI client pointed at the Braintrust gateway. This covers users who access Replicate through the OpenAI-compatible gateway endpoint but does not cover native replicate SDK users who call replicate.run() directly. There is no auto_instrument() reference and no wrap_replicate() function documented.

Upstream references

Local repo files inspected

  • py/src/braintrust/integrations/ — no replicate/ directory exists on main
  • py/src/braintrust/wrappers/ — no Replicate wrapper
  • py/noxfile.py — no test_replicate session
  • py/src/braintrust/integrations/__init__.py — Replicate not listed in integration registry
  • py/src/braintrust/integrations/versioning.py — no Replicate version matrix
  • py/pyproject.toml [tool.braintrust.matrix] — no Replicate entry
  • py/src/braintrust/auto.py — Replicate not listed in auto_instrument() parameters
  • Full repo grep for replicate across py/src/braintrust/ — zero matches

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels
    No fields configured for Feature.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions