docs: document attention backend selection by sbhavani · Pull Request #3142 · NVIDIA/TransformerEngine

sbhavani · 2026-06-24T18:31:34Z

Description

Documents TE’s high-level attention backend selection logic, including Hopper/Blackwell preferences, FA v3/v4 behavior, and cuDNN FusedAttention sub-backend context.

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Please list the changes introduced in this PR:

docs/examples/attention/attention.ipynb: Expanded the backend selection section with Hopper/Blackwell preference order, FA v3/v4 behavior, cuDNN sub-backend context, and fallback behavior.
docs/envvars.rst: Added a compact backend-selection overview and clarified NVTE_FUSED_ATTN_BACKEND.
transformer_engine/pytorch/attention/dot_product_attention/dot_product_attention.py: Updated the API docstring to match current architecture specific backend priority.

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Signed-off-by: Santosh Bhavani <santosh.bhavani@live.com>

greptile-apps · 2026-06-24T18:41:06Z

Greptile Summary

This PR documents TE's attention backend selection logic across three files, covering Hopper/Blackwell GPU preference ordering, FA v3/v4 behavior, and cuDNN FusedAttention sub-backend context. The documentation is consistent with the actual backend-selection code in utils.py.

attention.ipynb replaces a high-level, partially-outdated selection table with an architecture-specific breakdown (sm8x/sm90/sm100+sm120), removes the obsolete sub-backend 0 row (which no longer exists in the C++ NVTE_Fused_Attn_Backend enum), and correctly adds FA v3/v4 preference notes and the clarified env-var support table.
envvars.rst adds a new introductory section explaining the two-stage (filter → preference-order) selection model and expands the NVTE_FUSED_ATTN_BACKEND description to clarify it is a request, not a force.
dot_product_attention.py updates the API docstring to match the current Hopper-vs-pre-Hopper preference behavior.

Confidence Score: 5/5

Documentation-only change with no impact on runtime behavior; safe to merge.

All three documentation changes were cross-checked against the actual backend-selection code in utils.py and the C++ NVTE_Fused_Attn_Backend enum. FA3 sm90-only restriction, FA4 sm80+ coverage, the FA3-over-FA4 preference on Hopper, the FusedAttention-first ordering on Hopper+ (>=sm90), the FP8 disablement on sm120, and the removal of sub-backend 0 (which no longer exists as an enum value) are all consistent with the implementation. The revised env-var support table in the notebook (separating PyTorch-only from PyTorch+JAX variables) was verified against the JAX source.

No files require special attention.

Important Files Changed

Filename	Overview
docs/examples/attention/attention.ipynb	Removes obsolete sub-backend 0 table row, adds architecture-specific preference order table, expands FA v3/v4 notes, and corrects env-var support list — all accurately reflect the code.
docs/envvars.rst	Adds a two-stage backend-selection overview and improves the NVTE_FUSED_ATTN_BACKEND description; both additions are accurate.
transformer_engine/pytorch/attention/dot_product_attention/dot_product_attention.py	Docstring updated to accurately describe the architecture-aware preference ordering (pre-Hopper: FA > FusedAttn; Hopper+: FusedAttn > FA), matching the code in utils.py.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[DotProductAttention call] --> B{Stage 1: Filter by eligibility}
    B --> C[Check env vars\nNVTE_FLASH_ATTN / NVTE_FUSED_ATTN / NVTE_UNFUSED_ATTN]
    C --> D[Check GPU arch\nsm80+ for FA2/FusedAttn\nsm90 only for FA3\nsm80+ for FA4]
    D --> E[Check installed flash-attn version\nFA2 / FA3 / FA4]
    E --> F[Check cuDNN version & data type\nBF16/FP16 → sub-backend 1\nFP8 DPA → sub-backend 2]
    F --> G[Check input config\nseqlen, heads, layout, mask, dropout, etc.]
    G --> H{Stage 2: Apply preference order}
    H --> I{GPU arch >= sm90?}
    I -- Yes Hopper/Blackwell --> J[Prefer FusedAttention\ncuDNN sub-backend 1 or 2]
    I -- No pre-Hopper sm8x --> K[Prefer FlashAttention\nFA2 / FA3 / FA4]
    J --> L{FusedAttn eligible?}
    K --> M{FlashAttention eligible?}
    L -- Yes --> N[Use FusedAttention]
    L -- No --> M
    M -- Yes --> O{sm90 with FA3 & FA4?}
    O -- Yes --> P[Prefer FA3 over FA4]
    O -- No --> Q[Use highest eligible FA version]
    M -- No --> R{NVTE_UNFUSED_ATTN enabled?}
    R -- Yes --> S[Use UnfusedDotProductAttention]
    R -- No --> T[Error: no eligible backend]

%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A[DotProductAttention call] --> B{Stage 1: Filter by eligibility}
    B --> C[Check env vars\nNVTE_FLASH_ATTN / NVTE_FUSED_ATTN / NVTE_UNFUSED_ATTN]
    C --> D[Check GPU arch\nsm80+ for FA2/FusedAttn\nsm90 only for FA3\nsm80+ for FA4]
    D --> E[Check installed flash-attn version\nFA2 / FA3 / FA4]
    E --> F[Check cuDNN version & data type\nBF16/FP16 → sub-backend 1\nFP8 DPA → sub-backend 2]
    F --> G[Check input config\nseqlen, heads, layout, mask, dropout, etc.]
    G --> H{Stage 2: Apply preference order}
    H --> I{GPU arch >= sm90?}
    I -- Yes Hopper/Blackwell --> J[Prefer FusedAttention\ncuDNN sub-backend 1 or 2]
    I -- No pre-Hopper sm8x --> K[Prefer FlashAttention\nFA2 / FA3 / FA4]
    J --> L{FusedAttn eligible?}
    K --> M{FlashAttention eligible?}
    L -- Yes --> N[Use FusedAttention]
    L -- No --> M
    M -- Yes --> O{sm90 with FA3 & FA4?}
    O -- Yes --> P[Prefer FA3 over FA4]
    O -- No --> Q[Use highest eligible FA version]
    M -- No --> R{NVTE_UNFUSED_ATTN enabled?}
    R -- Yes --> S[Use UnfusedDotProductAttention]
    R -- No --> T[Error: no eligible backend]

_{Reviews (1): Last reviewed commit: "docs: document attention backend selecti..." | Re-trigger Greptile}

sbhavani requested a review from cyanguwa as a code owner June 24, 2026 18:31

sbhavani added the documentation Improvements or additions to documentation label Jun 24, 2026

sbhavani force-pushed the codex/document-attention-backend-selection branch from c3e855a to 4b1ec50 Compare June 24, 2026 18:32

docs: document attention backend selection

22883e4

Signed-off-by: Santosh Bhavani <santosh.bhavani@live.com>

sbhavani force-pushed the codex/document-attention-backend-selection branch from 4b1ec50 to 22883e4 Compare June 24, 2026 18:34

cyanguwa approved these changes Jun 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: document attention backend selection#3142

docs: document attention backend selection#3142
sbhavani wants to merge 1 commit into
NVIDIA:mainfrom
sbhavani:codex/document-attention-backend-selection

sbhavani commented Jun 24, 2026

Uh oh!

greptile-apps Bot commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

sbhavani commented Jun 24, 2026

Description

Type of change

Changes

Checklist:

Uh oh!

greptile-apps Bot commented Jun 24, 2026

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants