Summary
The sentence-transformers package is the most widely used Python library for generating local text embeddings. Its SentenceTransformer.encode() method is the primary embedding execution API in the Python AI ecosystem, powering RAG pipelines, semantic search, clustering, and reranking across thousands of applications. The latest release is v5.5.1 (May 12, 2026). This repository has zero instrumentation for any sentence-transformers execution surface — no integration directory, no wrapper, no patcher, no auto_instrument() support.
This gap is distinct from the existing huggingface_hub integration, which traces cloud inference through InferenceClient.feature_extraction() (HuggingFace Inference API). sentence-transformers runs models locally using a different execution path: SentenceTransformer(model_name).encode(texts) does not call any remote API and cannot be traced through huggingface_hub or any existing integration.
Comparable embedding/execution libraries with dedicated integrations in this repo: huggingface_hub (cloud inference), openai (embeddings via client.embeddings.create()), cohere (embeddings via client.embed()).
What needs to be instrumented
The sentence-transformers package exposes these execution surfaces, none of which are instrumented:
Dense embeddings (highest priority)
| Class / Method |
Description |
Return type |
SentenceTransformer.encode(sentences, ...) |
Generate dense vector embeddings for a list of sentences — the primary execution surface |
np.ndarray or list[Tensor] |
SentenceTransformer.encode_multi_process(sentences, pool, ...) |
Parallel embedding generation across multiple CPUs/GPUs |
np.ndarray |
encode() accepts batch_size, show_progress_bar, output_value (sentence_embedding, token_embeddings), precision (float32, int8, uint8, binary, ubinary), convert_to_numpy, convert_to_tensor, device, normalize_embeddings.
Token counting: encode() calls the underlying tokenizer; prompt token counts can be extracted from the tokenizer before encoding.
Async: No async variant exists in the standard API; parallelism is via encode_multi_process().
Reranking (CrossEncoder)
| Class / Method |
Description |
Return type |
CrossEncoder.predict(sentence_pairs, ...) |
Compute similarity scores for sentence pairs — used for reranking retrieved documents |
np.ndarray |
CrossEncoder.rank(query, documents, ...) |
Rank documents against a query using cross-encoder scoring |
list[dict] |
Sparse embeddings (SparseEncoder)
| Class / Method |
Description |
Return type |
SparseEncoder.encode(sentences, ...) |
Generate sparse vector representations — used with SPLADE and similar models |
dict with token IDs and weights |
Implementation notes
Local inference model: Unlike cloud API SDKs, sentence-transformers loads models locally using PyTorch. Patching the execution surface means wrapping SentenceTransformer.encode() directly rather than intercepting HTTP requests.
Span metrics: Useful span fields include:
input: the list of sentences encoded
output: embedding dimensions/count (not the raw vectors — potentially large)
metadata: model_name_or_path, device, precision, normalize_embeddings, model architecture details from SentenceTransformer.model_card_data
metrics: encoding latency, sentence count, approximate token counts
Token count estimation: The tokenizer is accessible at SentenceTransformer.tokenizer. Token counts for the input can be computed before encoding via len(tokenizer.encode(sentence)).
Model identification: SentenceTransformer.model_card_data.model_id or the constructor argument provides the model name for span metadata.
Patching strategy: Wrap SentenceTransformer.encode and CrossEncoder.predict at the class level. The SparseEncoder.encode follows the same pattern as SentenceTransformer.encode.
No coverage in any instrumentation layer
- No integration directory (
py/src/braintrust/integrations/sentence_transformers/)
- No wrapper function (e.g.
wrap_sentence_transformers())
- No patcher in any existing integration
- No nox test session (
test_sentence_transformers)
- No version entry in
py/src/braintrust/integrations/versioning.py
- No mention in
py/src/braintrust/integrations/__init__.py
A grep for sentence.transformers, sentence_transformers, or sbert across py/src/braintrust/ returns zero matches.
Braintrust docs status
not_found — sentence-transformers is not listed on the Braintrust integrations directory or the tracing guide. There is no auto_instrument() reference, no wrap_sentence_transformers() function, and no sentence-transformers setup documentation anywhere in Braintrust docs.
Upstream references
Local repo files inspected
py/src/braintrust/integrations/ — no sentence_transformers/ directory exists on main
py/src/braintrust/wrappers/ — no sentence-transformers wrapper
py/noxfile.py — no test_sentence_transformers session
py/src/braintrust/integrations/__init__.py — sentence-transformers not listed in integration registry
py/src/braintrust/integrations/versioning.py — no sentence-transformers version matrix
py/pyproject.toml [tool.braintrust.matrix] — no sentence-transformers entry
py/src/braintrust/auto.py — sentence-transformers not listed in auto_instrument() parameters
- Full repo grep for
sentence.transformers, sentence_transformers, sbert across py/src/braintrust/ — zero matches
Summary
The
sentence-transformerspackage is the most widely used Python library for generating local text embeddings. ItsSentenceTransformer.encode()method is the primary embedding execution API in the Python AI ecosystem, powering RAG pipelines, semantic search, clustering, and reranking across thousands of applications. The latest release is v5.5.1 (May 12, 2026). This repository has zero instrumentation for anysentence-transformersexecution surface — no integration directory, no wrapper, no patcher, noauto_instrument()support.This gap is distinct from the existing
huggingface_hubintegration, which traces cloud inference throughInferenceClient.feature_extraction()(HuggingFace Inference API).sentence-transformersruns models locally using a different execution path:SentenceTransformer(model_name).encode(texts)does not call any remote API and cannot be traced throughhuggingface_hubor any existing integration.Comparable embedding/execution libraries with dedicated integrations in this repo:
huggingface_hub(cloud inference),openai(embeddings viaclient.embeddings.create()),cohere(embeddings viaclient.embed()).What needs to be instrumented
The
sentence-transformerspackage exposes these execution surfaces, none of which are instrumented:Dense embeddings (highest priority)
SentenceTransformer.encode(sentences, ...)np.ndarrayorlist[Tensor]SentenceTransformer.encode_multi_process(sentences, pool, ...)np.ndarrayencode()acceptsbatch_size,show_progress_bar,output_value(sentence_embedding,token_embeddings),precision(float32,int8,uint8,binary,ubinary),convert_to_numpy,convert_to_tensor,device,normalize_embeddings.Token counting:
encode()calls the underlying tokenizer; prompt token counts can be extracted from the tokenizer before encoding.Async: No async variant exists in the standard API; parallelism is via
encode_multi_process().Reranking (CrossEncoder)
CrossEncoder.predict(sentence_pairs, ...)np.ndarrayCrossEncoder.rank(query, documents, ...)list[dict]Sparse embeddings (SparseEncoder)
SparseEncoder.encode(sentences, ...)dictwith token IDs and weightsImplementation notes
Local inference model: Unlike cloud API SDKs,
sentence-transformersloads models locally using PyTorch. Patching the execution surface means wrappingSentenceTransformer.encode()directly rather than intercepting HTTP requests.Span metrics: Useful span fields include:
input: the list of sentences encodedoutput: embedding dimensions/count (not the raw vectors — potentially large)metadata:model_name_or_path,device,precision,normalize_embeddings, model architecture details fromSentenceTransformer.model_card_datametrics: encoding latency, sentence count, approximate token countsToken count estimation: The tokenizer is accessible at
SentenceTransformer.tokenizer. Token counts for the input can be computed before encoding vialen(tokenizer.encode(sentence)).Model identification:
SentenceTransformer.model_card_data.model_idor the constructor argument provides the model name for span metadata.Patching strategy: Wrap
SentenceTransformer.encodeandCrossEncoder.predictat the class level. TheSparseEncoder.encodefollows the same pattern asSentenceTransformer.encode.No coverage in any instrumentation layer
py/src/braintrust/integrations/sentence_transformers/)wrap_sentence_transformers())test_sentence_transformers)py/src/braintrust/integrations/versioning.pypy/src/braintrust/integrations/__init__.pyA grep for
sentence.transformers,sentence_transformers, orsbertacrosspy/src/braintrust/returns zero matches.Braintrust docs status
not_found—sentence-transformersis not listed on the Braintrust integrations directory or the tracing guide. There is noauto_instrument()reference, nowrap_sentence_transformers()function, and no sentence-transformers setup documentation anywhere in Braintrust docs.Upstream references
SentenceTransformer.encode()API reference: https://www.sbert.net/docs/package_reference/SentenceTransformer.htmlCrossEncoderAPI reference: https://www.sbert.net/docs/package_reference/cross_encoder/CrossEncoder.htmlSparseEncoderusage: https://www.sbert.net/docs/package_reference/sparse_encoder/SparseEncoder.htmlLocal repo files inspected
py/src/braintrust/integrations/— nosentence_transformers/directory exists onmainpy/src/braintrust/wrappers/— no sentence-transformers wrapperpy/noxfile.py— notest_sentence_transformerssessionpy/src/braintrust/integrations/__init__.py— sentence-transformers not listed in integration registrypy/src/braintrust/integrations/versioning.py— no sentence-transformers version matrixpy/pyproject.toml[tool.braintrust.matrix]— no sentence-transformers entrypy/src/braintrust/auto.py— sentence-transformers not listed inauto_instrument()parameterssentence.transformers,sentence_transformers,sbertacrosspy/src/braintrust/— zero matches