[bot] Add HuggingFace Transformers integration for Pipeline text generation, translation, and summarization instrumentation

## Summary

The HuggingFace `transformers` library is the foundational Python library for running pretrained AI models locally, with over 150M monthly PyPI downloads and 140k+ GitHub stars. Its `pipeline()` API is the primary high-level execution surface for text generation, summarization, translation, question answering, and other generative tasks using locally loaded models. The latest release is **v5.12.0 (June 12, 2026)**. This repository has zero instrumentation for any `transformers` execution surface — no integration directory, no wrapper, no patcher, no `auto_instrument()` support.

This gap is distinct from the existing `huggingface_hub` integration, which traces **cloud** inference calls through `InferenceClient.chat_completion()` and `InferenceClient.text_generation()` over HTTP. `transformers.pipeline()` runs models **locally** — no HTTP request is made, and `wrap_openai()` or any existing cloud API wrapper cannot trace it.

Comparable local-inference frameworks (DSPy, LlamaIndex, Agno) all have dedicated integrations in this repo that wrap Python function calls without an HTTP boundary.

## What needs to be instrumented

The `transformers` package (v5.12.0) exposes these execution surfaces, none of which are instrumented:

### Pipeline API (highest priority)

| Function / Method | Description | Return type |
|---|---|---|
| `pipeline(task, model=..., ...)` | Factory function creating a pipeline for a specific NLP task | `Pipeline` subclass |
| `TextGenerationPipeline.__call__(text_inputs, ...)` | Generate text from a prompt using a causal LM | `list[dict]` with `generated_text` |
| `Text2TextGenerationPipeline.__call__(text_inputs, ...)` | Seq2seq text generation (T5, BART, etc.) | `list[dict]` with `generated_text` |
| `SummarizationPipeline.__call__(documents, ...)` | Summarize input documents | `list[dict]` with `summary_text` |
| `TranslationPipeline.__call__(text_inputs, ...)` | Translate text between languages | `list[dict]` with `translation_text` |
| `ConversationalPipeline.__call__(conversations, ...)` | Multi-turn conversational generation | `Conversation` |

All pipeline `__call__` methods share a common invocation pattern: `pipe(input_text, **kwargs)` returning a list of result dicts. Each pipeline type exposes task-specific generation parameters (`max_new_tokens`, `temperature`, `do_sample`, `top_p`, `top_k`, etc.).

### Model-level generation (lower priority)

| Method | Description |
|---|---|
| `PreTrainedModel.generate(inputs, ...)` | Direct model generation — lower-level than Pipeline; used for full control over decoding |

### Chat template execution

| Method | Description |
|---|---|
| `Pipeline.__call__(messages, ...)` with chat template | Pass a list of `{"role": ..., "content": ...}` messages to chat-capable text generation pipelines |

Chat template support was added to `TextGenerationPipeline` in recent releases, making it a first-class execution surface for chat completions using locally loaded chat models (Llama 3, Mistral Instruct, Qwen, etc.).

## Implementation notes

**Task-based dispatch:** `pipeline()` returns a subclass of `Pipeline` based on the `task` argument (`"text-generation"`, `"summarization"`, `"translation_en_to_fr"`, etc.). The integration should patch `Pipeline.__call__()` at the base class level — all pipeline subclasses call through the same `__call__` chain — or patch individual subclasses for task-specific span metadata.

**Model metadata:** `pipeline.model.config.name_or_path` provides the model identifier (e.g., `"meta-llama/Meta-Llama-3-8B-Instruct"`). `pipeline.task` provides the task name. Both are useful span metadata.

**Token counting:** For text generation pipelines, `pipeline.tokenizer.encode(input_text)` gives prompt token counts before generation. The number of new tokens generated is available from `generation_config.max_new_tokens` or by comparing input/output token lengths.

**Generation config:** Parameters like `max_new_tokens`, `temperature`, `do_sample`, `top_p`, `top_k`, `repetition_penalty` are passed as kwargs to `__call__()` and should be captured in span metadata.

**Batch inputs:** Pipeline `__call__()` accepts a single string or a list of strings. The integration should handle both cases, capturing input length metrics.

**Device/dtype metadata:** `pipeline.device` and `pipeline.torch_dtype` are useful observability fields (e.g., `"cuda:0"`, `torch.float16`).

**No HTTP requests:** Because all computation is local, VCR cannot be used. Tests should use small, fast models (`"sshleifer/tiny-gpt2"`, `"facebook/opt-125m"`) loaded from a local cache, or mock the forward pass. CI can run without GPU using CPU-only tiny models.

## No coverage in any instrumentation layer

- No integration directory (`py/src/braintrust/integrations/transformers/`)
- No wrapper function (e.g. `wrap_transformers()`)
- No patcher in any existing integration
- No nox test session (`test_transformers`)
- No version entry in `py/src/braintrust/integrations/versioning.py`
- No mention in `py/src/braintrust/integrations/__init__.py`

A grep for `transformers` across `py/src/braintrust/integrations/` returns zero matches in integration code (only incidental Python import-related comments in non-integration files).

## Braintrust docs status

`not_found` — `transformers` is not listed on the [Braintrust integrations directory](https://www.braintrust.dev/docs/integrations) or the [tracing guide](https://www.braintrust.dev/docs/guides/tracing). There is no `auto_instrument()` reference, no `wrap_transformers()` function, and no `transformers.pipeline()` setup documentation anywhere in Braintrust docs.

## Upstream references

- transformers on PyPI: https://pypi.org/project/transformers/ (v5.12.0, June 12, 2026)
- transformers on GitHub: https://github.com/huggingface/transformers (140k+ stars)
- Pipeline API documentation: https://huggingface.co/docs/transformers/en/main_classes/pipelines
- TextGenerationPipeline reference: https://huggingface.co/docs/transformers/en/main_classes/pipelines#transformers.TextGenerationPipeline
- Chat templates guide: https://huggingface.co/docs/transformers/en/chat_templating
- transformers v5.12.0 release notes: https://github.com/huggingface/transformers/releases/tag/v5.12.0

## Local repo files inspected

- `py/src/braintrust/integrations/` — no `transformers/` directory exists on `main`
- `py/src/braintrust/wrappers/` — no transformers wrapper
- `py/noxfile.py` — no `test_transformers` session
- `py/src/braintrust/integrations/__init__.py` — transformers not listed in integration registry
- `py/src/braintrust/integrations/versioning.py` — no transformers version matrix
- `py/pyproject.toml` `[tool.braintrust.matrix]` — no transformers entry; `lint` dependency group does not include transformers
- `py/src/braintrust/auto.py` — transformers not listed in `auto_instrument()` parameters
- Full repo grep for `transformers` (as import target) across `py/src/braintrust/integrations/` — zero matches in integration code

Function / Method	Description	Return type
`pipeline(task, model=..., ...)`	Factory function creating a pipeline for a specific NLP task	`Pipeline` subclass
`TextGenerationPipeline.__call__(text_inputs, ...)`	Generate text from a prompt using a causal LM	`list[dict]` with `generated_text`
`Text2TextGenerationPipeline.__call__(text_inputs, ...)`	Seq2seq text generation (T5, BART, etc.)	`list[dict]` with `generated_text`
`SummarizationPipeline.__call__(documents, ...)`	Summarize input documents	`list[dict]` with `summary_text`
`TranslationPipeline.__call__(text_inputs, ...)`	Translate text between languages	`list[dict]` with `translation_text`
`ConversationalPipeline.__call__(conversations, ...)`	Multi-turn conversational generation	`Conversation`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bot] Add HuggingFace Transformers integration for Pipeline text generation, translation, and summarization instrumentation #517

Summary

What needs to be instrumented

Pipeline API (highest priority)

Model-level generation (lower priority)

Chat template execution

Implementation notes

No coverage in any instrumentation layer

Braintrust docs status

Upstream references

Local repo files inspected

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[bot] Add HuggingFace Transformers integration for Pipeline text generation, translation, and summarization instrumentation #517

Description

Summary

What needs to be instrumented

Pipeline API (highest priority)

Model-level generation (lower priority)

Chat template execution

Implementation notes

No coverage in any instrumentation layer

Braintrust docs status

Upstream references

Local repo files inspected

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions