[AMD] Add DeepSeek-V4-Pro FP4 MI355X ATOM MTP3 benchmark#1627
[AMD] Add DeepSeek-V4-Pro FP4 MI355X ATOM MTP3 benchmark#1627seungrokj wants to merge 5 commits into
Conversation
Add new benchmark config for DeepSeek-V4-Pro with MTP3 speculative decoding on MI355X using ATOM. Uses image rocm/atom-dev:nightly_202605301523 with --method mtp --num-speculative-tokens 3. Concurrency range 4-256. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
4 similar comments
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| - dsv4-fp4-mi355x-atom-mtp | ||
| description: | ||
| - "Add DeepSeek-V4-Pro FP4 MI355X ATOM MTP3 benchmark; image rocm/atom-dev:nightly_202605301523, concurrency 4-256" | ||
| pr-link: TBD |
There was a problem hiding this comment.
🟡 The new perf-changelog.yaml entry has pr-link: TBD while every other recent entry uses a concrete https://github.com/SemiAnalysisAI/InferenceX/pull/N URL. This PR is #1627 — replacing TBD with https://github.com/SemiAnalysisAI/InferenceX/pull/1627 before merge keeps the changelog scannable. Nit, not a runtime issue.
Extended reasoning...
What the bug is
In perf-changelog.yaml, the new entry added by this PR (lines 3339-3343) for dsv4-fp4-mi355x-atom-mtp uses pr-link: TBD as a placeholder. Every other recent entry in the same file (e.g., lines 3313, 3319, 3325, 3331, 3337) carries a concrete GitHub URL of the form https://github.com/SemiAnalysisAI/InferenceX/pull/N.
Why this matters
The changelog's value comes from being able to quickly jump from a config-key change to the PR that introduced it. A TBD placeholder breaks that scannability convention: once merged, the link will remain dead and a reader looking up the provenance of the dsv4-fp4-mi355x-atom-mtp benchmark has to manually search PR history instead of clicking through. Multiplied across many such entries over time, this erodes the changelog's usefulness as a documentation artifact.
Why existing code/process didn't prevent it
There is no schema validation in the repo that enforces pr-link to be a URL — TBD is a free-form string and YAML parses it fine. The convention is documented only by example (the 70+ surrounding entries), so it relies on the PR author remembering to substitute the real number once the PR is opened.
Step-by-step proof
- Open the PR diff for
perf-changelog.yaml— the new entry ends withpr-link: TBDat line 3343. - The PR number for this change is 1627 (visible in the PR URL / metadata).
- Scroll up a few entries in the same file — e.g. line 3337 has
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1616, line 3331 has.../pull/1624, line 3325 has.../pull/1602, line 3319 has.../pull/1607. All concrete URLs. - The new entry breaks that pattern.
Impact
None at runtime — perf-changelog.yaml is documentation only and isn't consumed by the benchmark/CI runner (only .github/configs/amd-master.yaml is). The impact is purely on changelog readability and convention consistency.
How to fix
Replace pr-link: TBD on line 3343 with:
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1627This is a one-line edit before merge.
…attn support - Switch image from nightly_202605301523 to stable atom0.1.3 - Expand concurrency search space from 4-256 to 1-1024 - Refactor benchmark script to use PARALLEL_ARGS pattern (matching glm5/dsv4-atom scripts) for DP_ATTENTION + EP_SIZE combinations - Extract SPEC_ARGS for MTP speculative decoding args - Update perf-changelog image reference Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit c311908. Configure here.
| - isl: 1024 | ||
| osl: 1024 | ||
| search-space: | ||
| - { tp: 8, ep: 1, conc-start: 1, conc-end: 1024, spec-decoding: mtp } |
There was a problem hiding this comment.
Concurrency sweep range mismatch
Medium Severity
The new dsv4-fp4-mi355x-atom-mtp search space uses conc-start: 1 and conc-end: 1024, while the PR describes a 4–256 concurrency range and peer MI355X ATOM MTP configs (e.g. qwen3.5-fp8-mi355x-atom-mtp) use conc-start: 4 and conc-end: 256. Sweeps will run extra points below 4 and above 256 that were not validated in the reported throughput table.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit c311908. Configure here.
| else #DP+TP | ||
| PARALLEL_ARGS=(-tp "$TP" --enable-dp-attention ) | ||
| fi | ||
| fi |
There was a problem hiding this comment.
Expert parallel only with DP
Low Severity
--enable-expert-parallel is only added when DP_ATTENTION is true and EP_SIZE is greater than 1. Repository guidance and dsv4_fp4_mi355x_atom.sh enable expert parallel whenever EP_SIZE exceeds 1, independent of data-parallel attention. A future search-space entry with ep greater than 1 and default dp-attn would not match the non-MTP ATOM script behavior.
Reviewed by Cursor Bugbot for commit c311908. Configure here.
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26703437634 |
functionstackx
left a comment
There was a problem hiding this comment.
@seungrokj it is failing, can u take a look?
https://github.com/SemiAnalysisAI/InferenceX/actions/runs/26703437634/job/78700341161?pr=1627
…late DeepSeek-V4-Pro's tokenizer ships without a jinja chat_template, so --use-chat-template makes benchmark_serving.py crash during request sampling (ValueError: tokenizer.chat_template is not set), before any traffic is sent. This failed the canary in run 26703437634. --dsv4 routes prompts through the bundled encoding_dsv4.py encoder (<bos><User>...<Assistant><think> framing) and auto-enables chat templating internally, matching the dsv4 vLLM and SGLang MTP recipes.
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26722395974 |


Summary
dsv4-fp4-mi355x-atom-mtpfor DeepSeek-V4-Pro with MTP3 speculative decoding on MI355X using ATOMbenchmarks/single_node/dsv4_fp4_mi355x_atom_mtp.shwith--method mtp --num-speculative-tokens 3rocm/atom-dev:nightly_202605301523(ATOM upstream run 26690241645, 2026-05-30)Performance vs current InferenceX (dsv4-fp4-mi355x-atom, nightly_202605130853)
Test plan
dsv4_fp4_mi355x_atom_mtp.shstarts atom server with--method mtp --num-speculative-tokens 3--use-chat-templateis passed to the benchmark clientdsv4-fp4-mi355x-atom-mtpconfig picks up the new script viaspec-decoding: mtp🤖 Generated with Claude Code
Note
Low Risk
Benchmark-only additions (YAML config, shell script, changelog); no changes to auth, serving production paths, or shared runtime libraries beyond new optional sweep jobs.
Overview
Adds a new DeepSeek-V4-Pro FP4 MI355X ATOM MTP3 benchmark track alongside the existing non-MTP
dsv4-fp4-mi355x-atomentry.CI / matrix: New config key
dsv4-fp4-mi355x-atom-mtpin.github/configs/amd-master.yaml—rocm/atom:rocm7.2.4_..._atom0.1.3, TP8,spec-decoding: mtp, concurrency 1–1024 for 1024/1024 and 8192/1024 fixed-seq-len sweeps.Runner: New
benchmarks/single_node/dsv4_fp4_mi355x_atom_mtp.shstartsatom.entrypoints.openai_serverwith--method mtp --num-speculative-tokens 3, FP8 KV cache, optional DP/EP parallel flags, thenrun_benchmark_servingwith--dsv4for DSv4-appropriate prompts.Changelog:
perf-changelog.yamldocuments the new config key and image.Reviewed by Cursor Bugbot for commit 04590ea. Bugbot is set up for automated code reviews on this repo. Configure here.