Skip to content

[bot] Add HuggingFace Transformers integration for Pipeline text generation, translation, and summarization instrumentation #517

@braintrust-bot

Description

@braintrust-bot

Summary

The HuggingFace transformers library is the foundational Python library for running pretrained AI models locally, with over 150M monthly PyPI downloads and 140k+ GitHub stars. Its pipeline() API is the primary high-level execution surface for text generation, summarization, translation, question answering, and other generative tasks using locally loaded models. The latest release is v5.12.0 (June 12, 2026). This repository has zero instrumentation for any transformers execution surface — no integration directory, no wrapper, no patcher, no auto_instrument() support.

This gap is distinct from the existing huggingface_hub integration, which traces cloud inference calls through InferenceClient.chat_completion() and InferenceClient.text_generation() over HTTP. transformers.pipeline() runs models locally — no HTTP request is made, and wrap_openai() or any existing cloud API wrapper cannot trace it.

Comparable local-inference frameworks (DSPy, LlamaIndex, Agno) all have dedicated integrations in this repo that wrap Python function calls without an HTTP boundary.

What needs to be instrumented

The transformers package (v5.12.0) exposes these execution surfaces, none of which are instrumented:

Pipeline API (highest priority)

Function / Method Description Return type
pipeline(task, model=..., ...) Factory function creating a pipeline for a specific NLP task Pipeline subclass
TextGenerationPipeline.__call__(text_inputs, ...) Generate text from a prompt using a causal LM list[dict] with generated_text
Text2TextGenerationPipeline.__call__(text_inputs, ...) Seq2seq text generation (T5, BART, etc.) list[dict] with generated_text
SummarizationPipeline.__call__(documents, ...) Summarize input documents list[dict] with summary_text
TranslationPipeline.__call__(text_inputs, ...) Translate text between languages list[dict] with translation_text
ConversationalPipeline.__call__(conversations, ...) Multi-turn conversational generation Conversation

All pipeline __call__ methods share a common invocation pattern: pipe(input_text, **kwargs) returning a list of result dicts. Each pipeline type exposes task-specific generation parameters (max_new_tokens, temperature, do_sample, top_p, top_k, etc.).

Model-level generation (lower priority)

Method Description
PreTrainedModel.generate(inputs, ...) Direct model generation — lower-level than Pipeline; used for full control over decoding

Chat template execution

Method Description
Pipeline.__call__(messages, ...) with chat template Pass a list of {"role": ..., "content": ...} messages to chat-capable text generation pipelines

Chat template support was added to TextGenerationPipeline in recent releases, making it a first-class execution surface for chat completions using locally loaded chat models (Llama 3, Mistral Instruct, Qwen, etc.).

Implementation notes

Task-based dispatch: pipeline() returns a subclass of Pipeline based on the task argument ("text-generation", "summarization", "translation_en_to_fr", etc.). The integration should patch Pipeline.__call__() at the base class level — all pipeline subclasses call through the same __call__ chain — or patch individual subclasses for task-specific span metadata.

Model metadata: pipeline.model.config.name_or_path provides the model identifier (e.g., "meta-llama/Meta-Llama-3-8B-Instruct"). pipeline.task provides the task name. Both are useful span metadata.

Token counting: For text generation pipelines, pipeline.tokenizer.encode(input_text) gives prompt token counts before generation. The number of new tokens generated is available from generation_config.max_new_tokens or by comparing input/output token lengths.

Generation config: Parameters like max_new_tokens, temperature, do_sample, top_p, top_k, repetition_penalty are passed as kwargs to __call__() and should be captured in span metadata.

Batch inputs: Pipeline __call__() accepts a single string or a list of strings. The integration should handle both cases, capturing input length metrics.

Device/dtype metadata: pipeline.device and pipeline.torch_dtype are useful observability fields (e.g., "cuda:0", torch.float16).

No HTTP requests: Because all computation is local, VCR cannot be used. Tests should use small, fast models ("sshleifer/tiny-gpt2", "facebook/opt-125m") loaded from a local cache, or mock the forward pass. CI can run without GPU using CPU-only tiny models.

No coverage in any instrumentation layer

  • No integration directory (py/src/braintrust/integrations/transformers/)
  • No wrapper function (e.g. wrap_transformers())
  • No patcher in any existing integration
  • No nox test session (test_transformers)
  • No version entry in py/src/braintrust/integrations/versioning.py
  • No mention in py/src/braintrust/integrations/__init__.py

A grep for transformers across py/src/braintrust/integrations/ returns zero matches in integration code (only incidental Python import-related comments in non-integration files).

Braintrust docs status

not_foundtransformers is not listed on the Braintrust integrations directory or the tracing guide. There is no auto_instrument() reference, no wrap_transformers() function, and no transformers.pipeline() setup documentation anywhere in Braintrust docs.

Upstream references

Local repo files inspected

  • py/src/braintrust/integrations/ — no transformers/ directory exists on main
  • py/src/braintrust/wrappers/ — no transformers wrapper
  • py/noxfile.py — no test_transformers session
  • py/src/braintrust/integrations/__init__.py — transformers not listed in integration registry
  • py/src/braintrust/integrations/versioning.py — no transformers version matrix
  • py/pyproject.toml [tool.braintrust.matrix] — no transformers entry; lint dependency group does not include transformers
  • py/src/braintrust/auto.py — transformers not listed in auto_instrument() parameters
  • Full repo grep for transformers (as import target) across py/src/braintrust/integrations/ — zero matches in integration code

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels
    No fields configured for Feature.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions