[bot] Add Replicate Python SDK native integration for run, stream, and predictions execution instrumentation

## Summary

The Replicate Python SDK (`replicate`) is the official Python client for Replicate, a platform for running AI models (LLMs, image generation, video, audio, and more) in the cloud. Its `replicate.run()` and `replicate.predictions.create()` APIs are the primary execution surfaces. The latest release is **v1.0.7 (May 27, 2026)**. This repository has zero native instrumentation for any `replicate` SDK execution surface — no integration directory, no wrapper, no patcher, no `auto_instrument()` support.

Braintrust documents a [gateway/proxy-based integration with Replicate](https://www.braintrust.dev/docs/integrations/ai-providers/replicate) where users route Replicate calls through the Braintrust gateway using an OpenAI-compatible client. This approach does not cover users of the **native `replicate` Python SDK**, which has its own distinct API (`replicate.run("owner/model", input={...})`). The `replicate` SDK is not an `openai.OpenAI` subclass and cannot be wrapped with `wrap_openai()`.

The same pattern exists for other providers in this repo: Mistral, OpenRouter, xAI, Groq, and Together AI all have OpenAI-compatible endpoints, yet received dedicated native integrations because users following official provider documentation and `pip install <provider>` get zero Braintrust tracing from the OpenAI wrapper alone.

## What needs to be instrumented

The `replicate` package (v1.0.7) exposes these execution surfaces, none of which are instrumented:

### Model execution (highest priority)

| API | Description | Return type |
|---|---|---|
| `replicate.run(model, input, ...)` | Synchronously run a model and return the complete output | `Any` (model-specific: str, list, FileOutput, ...) |
| `replicate.stream(model, input, ...)` | Stream model output as server-sent events — primarily for LLM token streaming | `Iterator[ServerSentEvent]` |

`replicate.run()` accepts any Replicate model identifier (e.g., `"meta/meta-llama-3-8b-instruct"`, `"stability-ai/sdxl"`) and an `input` dict with model-specific parameters. The model identifier encodes the provider and model name — useful for span metadata.

**LLM streaming:** For language models, `replicate.run()` returns an iterator yielding text tokens. `replicate.stream()` provides SSE-based streaming for the same use case.

### Async execution

| API | Description | Return type |
|---|---|---|
| `await replicate.async_run(model, input, ...)` | Async equivalent of `replicate.run()` | `Any` |

### Background predictions

| API | Description | Return type |
|---|---|---|
| `replicate.predictions.create(model, input, ...)` | Create a prediction without waiting for it to complete | `Prediction` |
| `await replicate.predictions.async_create(model, input, ...)` | Async prediction creation | `Prediction` |
| `prediction.wait()` | Wait for a prediction to complete | `Prediction` |

`Prediction` objects expose `.status`, `.output`, `.logs`, `.metrics` (includes `predict_time`), `.urls` (cancel, get, stream webhooks).

## Implementation notes

**Generic execution model:** Unlike OpenAI or Anthropic where model names map to a known schema, Replicate model inputs and outputs are model-specific. The integration should log the model identifier (`owner/model` or `owner/model:version`), the input dict, and the output value. Token count extraction is only possible for models that return usage metadata in their response or logs.

**Model identifier:** Replicate model identifiers like `"meta/meta-llama-3-8b-instruct"` or `"stability-ai/stable-diffusion-3"` encode the model type. The span should capture `model` (full identifier) and `provider: "replicate"` in metadata.

**Output types:** LLM models return iterators of text tokens; image models return `FileOutput` objects; other models may return strings, numbers, lists, or dicts. The integration should handle all cases, collecting token text for LLM outputs and logging output metadata (type, size if file) for others.

**Streaming LLM output:** `replicate.run()` for LLM models returns a generator. The span should be created before iteration starts and finalized (with full output text) when iteration completes. `replicate.stream()` follows the same pattern with SSE events.

**`predict_time` metric:** `Prediction.metrics["predict_time"]` provides server-side inference time in seconds — a useful span metric.

**Client pattern:** The module-level functions (`replicate.run()`) delegate to a default `Client` instance. Patching `Client.run`, `Client.stream`, and `Client.async_run` covers both usage patterns.

**Authentication:** Uses `REPLICATE_API_TOKEN` env var or explicit `api_token` kwarg. VCR cassettes need `Authorization: Token <key>` header sanitization.

## Proposed span shape

| Span field | Content |
|---|---|
| **input** | `model` identifier, `input` dict |
| **output** | collected text output (LLM), `null` or file metadata (image/video) |
| **metadata** | `provider: "replicate"`, `model` (full identifier), `version` if specified |
| **metrics** | `predict_time` from `Prediction.metrics`, `time_to_first_token` (streaming LLMs) |

## No coverage in any instrumentation layer

- No integration directory (`py/src/braintrust/integrations/replicate/`)
- No wrapper function (e.g. `wrap_replicate()`)
- No patcher in any existing integration
- No nox test session (`test_replicate`)
- No version entry in `py/src/braintrust/integrations/versioning.py`
- No mention in `py/src/braintrust/integrations/__init__.py`

A grep for `replicate` across `py/src/braintrust/` returns zero matches.

## Braintrust docs status

`unclear` — The [Braintrust Replicate integration page](https://www.braintrust.dev/docs/integrations/ai-providers/replicate) documents a **gateway/proxy approach** using an OpenAI client pointed at the Braintrust gateway. This covers users who access Replicate through the OpenAI-compatible gateway endpoint but does not cover native `replicate` SDK users who call `replicate.run()` directly. There is no `auto_instrument()` reference and no `wrap_replicate()` function documented.

## Upstream references

- replicate on PyPI: https://pypi.org/project/replicate/ (v1.0.7, May 27, 2026)
- replicate Python SDK on GitHub: https://github.com/replicate/replicate-python
- Replicate Python quickstart: https://replicate.com/docs/get-started/python
- Replicate API reference: https://replicate.com/docs/reference/http
- Braintrust Replicate docs (gateway approach): https://www.braintrust.dev/docs/integrations/ai-providers/replicate

## Local repo files inspected

- `py/src/braintrust/integrations/` — no `replicate/` directory exists on `main`
- `py/src/braintrust/wrappers/` — no Replicate wrapper
- `py/noxfile.py` — no `test_replicate` session
- `py/src/braintrust/integrations/__init__.py` — Replicate not listed in integration registry
- `py/src/braintrust/integrations/versioning.py` — no Replicate version matrix
- `py/pyproject.toml` `[tool.braintrust.matrix]` — no Replicate entry
- `py/src/braintrust/auto.py` — Replicate not listed in `auto_instrument()` parameters
- Full repo grep for `replicate` across `py/src/braintrust/` — zero matches

Span field	Content
input	`model` identifier, `input` dict
output	collected text output (LLM), `null` or file metadata (image/video)
metadata	`provider: "replicate"`, `model` (full identifier), `version` if specified
metrics	`predict_time` from `Prediction.metrics`, `time_to_first_token` (streaming LLMs)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bot] Add Replicate Python SDK native integration for run, stream, and predictions execution instrumentation #516

Summary

What needs to be instrumented

Model execution (highest priority)

Async execution

Background predictions

Implementation notes

Proposed span shape

No coverage in any instrumentation layer

Braintrust docs status

Upstream references

Local repo files inspected

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

API	Description	Return type
`replicate.run(model, input, ...)`	Synchronously run a model and return the complete output	`Any` (model-specific: str, list, FileOutput, ...)
`replicate.stream(model, input, ...)`	Stream model output as server-sent events — primarily for LLM token streaming	`Iterator[ServerSentEvent]`

API	Description	Return type
`replicate.predictions.create(model, input, ...)`	Create a prediction without waiting for it to complete	`Prediction`
`await replicate.predictions.async_create(model, input, ...)`	Async prediction creation	`Prediction`
`prediction.wait()`	Wait for a prediction to complete	`Prediction`

[bot] Add Replicate Python SDK native integration for run, stream, and predictions execution instrumentation #516

Description

Summary

What needs to be instrumented

Model execution (highest priority)

Async execution

Background predictions

Implementation notes

Proposed span shape

No coverage in any instrumentation layer

Braintrust docs status

Upstream references

Local repo files inspected

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions