Summary
The HuggingFace transformers library is the foundational Python library for running pretrained AI models locally, with over 150M monthly PyPI downloads and 140k+ GitHub stars. Its pipeline() API is the primary high-level execution surface for text generation, summarization, translation, question answering, and other generative tasks using locally loaded models. The latest release is v5.12.0 (June 12, 2026). This repository has zero instrumentation for any transformers execution surface — no integration directory, no wrapper, no patcher, no auto_instrument() support.
This gap is distinct from the existing huggingface_hub integration, which traces cloud inference calls through InferenceClient.chat_completion() and InferenceClient.text_generation() over HTTP. transformers.pipeline() runs models locally — no HTTP request is made, and wrap_openai() or any existing cloud API wrapper cannot trace it.
Comparable local-inference frameworks (DSPy, LlamaIndex, Agno) all have dedicated integrations in this repo that wrap Python function calls without an HTTP boundary.
What needs to be instrumented
The transformers package (v5.12.0) exposes these execution surfaces, none of which are instrumented:
Pipeline API (highest priority)
| Function / Method |
Description |
Return type |
pipeline(task, model=..., ...) |
Factory function creating a pipeline for a specific NLP task |
Pipeline subclass |
TextGenerationPipeline.__call__(text_inputs, ...) |
Generate text from a prompt using a causal LM |
list[dict] with generated_text |
Text2TextGenerationPipeline.__call__(text_inputs, ...) |
Seq2seq text generation (T5, BART, etc.) |
list[dict] with generated_text |
SummarizationPipeline.__call__(documents, ...) |
Summarize input documents |
list[dict] with summary_text |
TranslationPipeline.__call__(text_inputs, ...) |
Translate text between languages |
list[dict] with translation_text |
ConversationalPipeline.__call__(conversations, ...) |
Multi-turn conversational generation |
Conversation |
All pipeline __call__ methods share a common invocation pattern: pipe(input_text, **kwargs) returning a list of result dicts. Each pipeline type exposes task-specific generation parameters (max_new_tokens, temperature, do_sample, top_p, top_k, etc.).
Model-level generation (lower priority)
| Method |
Description |
PreTrainedModel.generate(inputs, ...) |
Direct model generation — lower-level than Pipeline; used for full control over decoding |
Chat template execution
| Method |
Description |
Pipeline.__call__(messages, ...) with chat template |
Pass a list of {"role": ..., "content": ...} messages to chat-capable text generation pipelines |
Chat template support was added to TextGenerationPipeline in recent releases, making it a first-class execution surface for chat completions using locally loaded chat models (Llama 3, Mistral Instruct, Qwen, etc.).
Implementation notes
Task-based dispatch: pipeline() returns a subclass of Pipeline based on the task argument ("text-generation", "summarization", "translation_en_to_fr", etc.). The integration should patch Pipeline.__call__() at the base class level — all pipeline subclasses call through the same __call__ chain — or patch individual subclasses for task-specific span metadata.
Model metadata: pipeline.model.config.name_or_path provides the model identifier (e.g., "meta-llama/Meta-Llama-3-8B-Instruct"). pipeline.task provides the task name. Both are useful span metadata.
Token counting: For text generation pipelines, pipeline.tokenizer.encode(input_text) gives prompt token counts before generation. The number of new tokens generated is available from generation_config.max_new_tokens or by comparing input/output token lengths.
Generation config: Parameters like max_new_tokens, temperature, do_sample, top_p, top_k, repetition_penalty are passed as kwargs to __call__() and should be captured in span metadata.
Batch inputs: Pipeline __call__() accepts a single string or a list of strings. The integration should handle both cases, capturing input length metrics.
Device/dtype metadata: pipeline.device and pipeline.torch_dtype are useful observability fields (e.g., "cuda:0", torch.float16).
No HTTP requests: Because all computation is local, VCR cannot be used. Tests should use small, fast models ("sshleifer/tiny-gpt2", "facebook/opt-125m") loaded from a local cache, or mock the forward pass. CI can run without GPU using CPU-only tiny models.
No coverage in any instrumentation layer
- No integration directory (
py/src/braintrust/integrations/transformers/)
- No wrapper function (e.g.
wrap_transformers())
- No patcher in any existing integration
- No nox test session (
test_transformers)
- No version entry in
py/src/braintrust/integrations/versioning.py
- No mention in
py/src/braintrust/integrations/__init__.py
A grep for transformers across py/src/braintrust/integrations/ returns zero matches in integration code (only incidental Python import-related comments in non-integration files).
Braintrust docs status
not_found — transformers is not listed on the Braintrust integrations directory or the tracing guide. There is no auto_instrument() reference, no wrap_transformers() function, and no transformers.pipeline() setup documentation anywhere in Braintrust docs.
Upstream references
Local repo files inspected
py/src/braintrust/integrations/ — no transformers/ directory exists on main
py/src/braintrust/wrappers/ — no transformers wrapper
py/noxfile.py — no test_transformers session
py/src/braintrust/integrations/__init__.py — transformers not listed in integration registry
py/src/braintrust/integrations/versioning.py — no transformers version matrix
py/pyproject.toml [tool.braintrust.matrix] — no transformers entry; lint dependency group does not include transformers
py/src/braintrust/auto.py — transformers not listed in auto_instrument() parameters
- Full repo grep for
transformers (as import target) across py/src/braintrust/integrations/ — zero matches in integration code
Summary
The HuggingFace
transformerslibrary is the foundational Python library for running pretrained AI models locally, with over 150M monthly PyPI downloads and 140k+ GitHub stars. Itspipeline()API is the primary high-level execution surface for text generation, summarization, translation, question answering, and other generative tasks using locally loaded models. The latest release is v5.12.0 (June 12, 2026). This repository has zero instrumentation for anytransformersexecution surface — no integration directory, no wrapper, no patcher, noauto_instrument()support.This gap is distinct from the existing
huggingface_hubintegration, which traces cloud inference calls throughInferenceClient.chat_completion()andInferenceClient.text_generation()over HTTP.transformers.pipeline()runs models locally — no HTTP request is made, andwrap_openai()or any existing cloud API wrapper cannot trace it.Comparable local-inference frameworks (DSPy, LlamaIndex, Agno) all have dedicated integrations in this repo that wrap Python function calls without an HTTP boundary.
What needs to be instrumented
The
transformerspackage (v5.12.0) exposes these execution surfaces, none of which are instrumented:Pipeline API (highest priority)
pipeline(task, model=..., ...)PipelinesubclassTextGenerationPipeline.__call__(text_inputs, ...)list[dict]withgenerated_textText2TextGenerationPipeline.__call__(text_inputs, ...)list[dict]withgenerated_textSummarizationPipeline.__call__(documents, ...)list[dict]withsummary_textTranslationPipeline.__call__(text_inputs, ...)list[dict]withtranslation_textConversationalPipeline.__call__(conversations, ...)ConversationAll pipeline
__call__methods share a common invocation pattern:pipe(input_text, **kwargs)returning a list of result dicts. Each pipeline type exposes task-specific generation parameters (max_new_tokens,temperature,do_sample,top_p,top_k, etc.).Model-level generation (lower priority)
PreTrainedModel.generate(inputs, ...)Chat template execution
Pipeline.__call__(messages, ...)with chat template{"role": ..., "content": ...}messages to chat-capable text generation pipelinesChat template support was added to
TextGenerationPipelinein recent releases, making it a first-class execution surface for chat completions using locally loaded chat models (Llama 3, Mistral Instruct, Qwen, etc.).Implementation notes
Task-based dispatch:
pipeline()returns a subclass ofPipelinebased on thetaskargument ("text-generation","summarization","translation_en_to_fr", etc.). The integration should patchPipeline.__call__()at the base class level — all pipeline subclasses call through the same__call__chain — or patch individual subclasses for task-specific span metadata.Model metadata:
pipeline.model.config.name_or_pathprovides the model identifier (e.g.,"meta-llama/Meta-Llama-3-8B-Instruct").pipeline.taskprovides the task name. Both are useful span metadata.Token counting: For text generation pipelines,
pipeline.tokenizer.encode(input_text)gives prompt token counts before generation. The number of new tokens generated is available fromgeneration_config.max_new_tokensor by comparing input/output token lengths.Generation config: Parameters like
max_new_tokens,temperature,do_sample,top_p,top_k,repetition_penaltyare passed as kwargs to__call__()and should be captured in span metadata.Batch inputs: Pipeline
__call__()accepts a single string or a list of strings. The integration should handle both cases, capturing input length metrics.Device/dtype metadata:
pipeline.deviceandpipeline.torch_dtypeare useful observability fields (e.g.,"cuda:0",torch.float16).No HTTP requests: Because all computation is local, VCR cannot be used. Tests should use small, fast models (
"sshleifer/tiny-gpt2","facebook/opt-125m") loaded from a local cache, or mock the forward pass. CI can run without GPU using CPU-only tiny models.No coverage in any instrumentation layer
py/src/braintrust/integrations/transformers/)wrap_transformers())test_transformers)py/src/braintrust/integrations/versioning.pypy/src/braintrust/integrations/__init__.pyA grep for
transformersacrosspy/src/braintrust/integrations/returns zero matches in integration code (only incidental Python import-related comments in non-integration files).Braintrust docs status
not_found—transformersis not listed on the Braintrust integrations directory or the tracing guide. There is noauto_instrument()reference, nowrap_transformers()function, and notransformers.pipeline()setup documentation anywhere in Braintrust docs.Upstream references
Local repo files inspected
py/src/braintrust/integrations/— notransformers/directory exists onmainpy/src/braintrust/wrappers/— no transformers wrapperpy/noxfile.py— notest_transformerssessionpy/src/braintrust/integrations/__init__.py— transformers not listed in integration registrypy/src/braintrust/integrations/versioning.py— no transformers version matrixpy/pyproject.toml[tool.braintrust.matrix]— no transformers entry;lintdependency group does not include transformerspy/src/braintrust/auto.py— transformers not listed inauto_instrument()parameterstransformers(as import target) acrosspy/src/braintrust/integrations/— zero matches in integration code