Skip to content

docs: document attention backend selection#3142

Open
sbhavani wants to merge 1 commit into
NVIDIA:mainfrom
sbhavani:codex/document-attention-backend-selection
Open

docs: document attention backend selection#3142
sbhavani wants to merge 1 commit into
NVIDIA:mainfrom
sbhavani:codex/document-attention-backend-selection

Conversation

@sbhavani

Copy link
Copy Markdown
Collaborator

Description

Documents TE’s high-level attention backend selection logic, including Hopper/Blackwell preferences, FA v3/v4 behavior, and cuDNN FusedAttention sub-backend context.

Type of change

  • Documentation change (change only to the documentation, either a fix or a new content)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Infra/Build change
  • Code refactoring

Changes

Please list the changes introduced in this PR:

  • docs/examples/attention/attention.ipynb: Expanded the backend selection section with Hopper/Blackwell preference order, FA v3/v4 behavior, cuDNN sub-backend context, and fallback behavior.

  • docs/envvars.rst: Added a compact backend-selection overview and clarified NVTE_FUSED_ATTN_BACKEND.

  • transformer_engine/pytorch/attention/dot_product_attention/dot_product_attention.py: Updated the API docstring to match current architecture specific backend priority.

Checklist:

  • I have read and followed the contributing guidelines
  • The functionality is complete
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@sbhavani sbhavani requested a review from cyanguwa as a code owner June 24, 2026 18:31
@sbhavani sbhavani added the documentation Improvements or additions to documentation label Jun 24, 2026
@sbhavani sbhavani force-pushed the codex/document-attention-backend-selection branch from c3e855a to 4b1ec50 Compare June 24, 2026 18:32
Signed-off-by: Santosh Bhavani <santosh.bhavani@live.com>
@sbhavani sbhavani force-pushed the codex/document-attention-backend-selection branch from 4b1ec50 to 22883e4 Compare June 24, 2026 18:34
@greptile-apps

greptile-apps Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR documents TE's attention backend selection logic across three files, covering Hopper/Blackwell GPU preference ordering, FA v3/v4 behavior, and cuDNN FusedAttention sub-backend context. The documentation is consistent with the actual backend-selection code in utils.py.

  • attention.ipynb replaces a high-level, partially-outdated selection table with an architecture-specific breakdown (sm8x/sm90/sm100+sm120), removes the obsolete sub-backend 0 row (which no longer exists in the C++ NVTE_Fused_Attn_Backend enum), and correctly adds FA v3/v4 preference notes and the clarified env-var support table.
  • envvars.rst adds a new introductory section explaining the two-stage (filter → preference-order) selection model and expands the NVTE_FUSED_ATTN_BACKEND description to clarify it is a request, not a force.
  • dot_product_attention.py updates the API docstring to match the current Hopper-vs-pre-Hopper preference behavior.

Confidence Score: 5/5

Documentation-only change with no impact on runtime behavior; safe to merge.

All three documentation changes were cross-checked against the actual backend-selection code in utils.py and the C++ NVTE_Fused_Attn_Backend enum. FA3 sm90-only restriction, FA4 sm80+ coverage, the FA3-over-FA4 preference on Hopper, the FusedAttention-first ordering on Hopper+ (>=sm90), the FP8 disablement on sm120, and the removal of sub-backend 0 (which no longer exists as an enum value) are all consistent with the implementation. The revised env-var support table in the notebook (separating PyTorch-only from PyTorch+JAX variables) was verified against the JAX source.

No files require special attention.

Important Files Changed

Filename Overview
docs/examples/attention/attention.ipynb Removes obsolete sub-backend 0 table row, adds architecture-specific preference order table, expands FA v3/v4 notes, and corrects env-var support list — all accurately reflect the code.
docs/envvars.rst Adds a two-stage backend-selection overview and improves the NVTE_FUSED_ATTN_BACKEND description; both additions are accurate.
transformer_engine/pytorch/attention/dot_product_attention/dot_product_attention.py Docstring updated to accurately describe the architecture-aware preference ordering (pre-Hopper: FA > FusedAttn; Hopper+: FusedAttn > FA), matching the code in utils.py.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[DotProductAttention call] --> B{Stage 1: Filter by eligibility}
    B --> C[Check env vars\nNVTE_FLASH_ATTN / NVTE_FUSED_ATTN / NVTE_UNFUSED_ATTN]
    C --> D[Check GPU arch\nsm80+ for FA2/FusedAttn\nsm90 only for FA3\nsm80+ for FA4]
    D --> E[Check installed flash-attn version\nFA2 / FA3 / FA4]
    E --> F[Check cuDNN version & data type\nBF16/FP16 → sub-backend 1\nFP8 DPA → sub-backend 2]
    F --> G[Check input config\nseqlen, heads, layout, mask, dropout, etc.]
    G --> H{Stage 2: Apply preference order}
    H --> I{GPU arch >= sm90?}
    I -- Yes Hopper/Blackwell --> J[Prefer FusedAttention\ncuDNN sub-backend 1 or 2]
    I -- No pre-Hopper sm8x --> K[Prefer FlashAttention\nFA2 / FA3 / FA4]
    J --> L{FusedAttn eligible?}
    K --> M{FlashAttention eligible?}
    L -- Yes --> N[Use FusedAttention]
    L -- No --> M
    M -- Yes --> O{sm90 with FA3 & FA4?}
    O -- Yes --> P[Prefer FA3 over FA4]
    O -- No --> Q[Use highest eligible FA version]
    M -- No --> R{NVTE_UNFUSED_ATTN enabled?}
    R -- Yes --> S[Use UnfusedDotProductAttention]
    R -- No --> T[Error: no eligible backend]
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A[DotProductAttention call] --> B{Stage 1: Filter by eligibility}
    B --> C[Check env vars\nNVTE_FLASH_ATTN / NVTE_FUSED_ATTN / NVTE_UNFUSED_ATTN]
    C --> D[Check GPU arch\nsm80+ for FA2/FusedAttn\nsm90 only for FA3\nsm80+ for FA4]
    D --> E[Check installed flash-attn version\nFA2 / FA3 / FA4]
    E --> F[Check cuDNN version & data type\nBF16/FP16 → sub-backend 1\nFP8 DPA → sub-backend 2]
    F --> G[Check input config\nseqlen, heads, layout, mask, dropout, etc.]
    G --> H{Stage 2: Apply preference order}
    H --> I{GPU arch >= sm90?}
    I -- Yes Hopper/Blackwell --> J[Prefer FusedAttention\ncuDNN sub-backend 1 or 2]
    I -- No pre-Hopper sm8x --> K[Prefer FlashAttention\nFA2 / FA3 / FA4]
    J --> L{FusedAttn eligible?}
    K --> M{FlashAttention eligible?}
    L -- Yes --> N[Use FusedAttention]
    L -- No --> M
    M -- Yes --> O{sm90 with FA3 & FA4?}
    O -- Yes --> P[Prefer FA3 over FA4]
    O -- No --> Q[Use highest eligible FA version]
    M -- No --> R{NVTE_UNFUSED_ATTN enabled?}
    R -- Yes --> S[Use UnfusedDotProductAttention]
    R -- No --> T[Error: no eligible backend]
Loading

Reviews (1): Last reviewed commit: "docs: document attention backend selecti..." | Re-trigger Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants