Skip to content

[AMD] Add DeepSeek-V4-Pro FP4 MI355X ATOM MTP3 benchmark#1627

Open
seungrokj wants to merge 5 commits into
mainfrom
seungrokj/dsv4-fp4-mi355x-atom-mtp
Open

[AMD] Add DeepSeek-V4-Pro FP4 MI355X ATOM MTP3 benchmark#1627
seungrokj wants to merge 5 commits into
mainfrom
seungrokj/dsv4-fp4-mi355x-atom-mtp

Conversation

@seungrokj
Copy link
Copy Markdown
Collaborator

@seungrokj seungrokj commented May 31, 2026

Summary

  • Add new benchmark config dsv4-fp4-mi355x-atom-mtp for DeepSeek-V4-Pro with MTP3 speculative decoding on MI355X using ATOM
  • Add new script benchmarks/single_node/dsv4_fp4_mi355x_atom_mtp.sh with --method mtp --num-speculative-tokens 3
  • Image: rocm/atom-dev:nightly_202605301523 (ATOM upstream run 26690241645, 2026-05-30)
  • Concurrency range: 4–256 for both ISL 1024 and 8192

Performance vs current InferenceX (dsv4-fp4-mi355x-atom, nightly_202605130853)

ISL OSL Conc InferenceX (tok/s/GPU) ATOM MTP3 (tok/s/GPU) Δ%
1024 1024 4 43.91 95.00 +116.3%
1024 1024 8 82.81 178.42 +115.5%
1024 1024 16 146.53 254.04 +73.4%
1024 1024 32 240.89 421.48 +75.0%
1024 1024 64 389.30 609.97 +56.7%
1024 1024 128 601.21 837.01 +39.2%
1024 1024 256 880.78 1086.07 +23.3%
8192 1024 4 168.42 435.60 +158.6%
8192 1024 8 307.43 726.92 +136.5%
8192 1024 16 512.60 1055.77 +106.0%
8192 1024 32 814.94 1450.80 +78.0%
8192 1024 64 1162.87 1966.73 +69.1%
8192 1024 128 1469.89 2390.46 +62.6%
8192 1024 256 704.73 2662.51 +277.8%

Test plan

  • Verify dsv4_fp4_mi355x_atom_mtp.sh starts atom server with --method mtp --num-speculative-tokens 3
  • Confirm --use-chat-template is passed to the benchmark client
  • Confirm dsv4-fp4-mi355x-atom-mtp config picks up the new script via spec-decoding: mtp
  • Run benchmark at conc=32 to confirm throughput matches upstream numbers

🤖 Generated with Claude Code


Note

Low Risk
Benchmark-only additions (YAML config, shell script, changelog); no changes to auth, serving production paths, or shared runtime libraries beyond new optional sweep jobs.

Overview
Adds a new DeepSeek-V4-Pro FP4 MI355X ATOM MTP3 benchmark track alongside the existing non-MTP dsv4-fp4-mi355x-atom entry.

CI / matrix: New config key dsv4-fp4-mi355x-atom-mtp in .github/configs/amd-master.yamlrocm/atom:rocm7.2.4_..._atom0.1.3, TP8, spec-decoding: mtp, concurrency 1–1024 for 1024/1024 and 8192/1024 fixed-seq-len sweeps.

Runner: New benchmarks/single_node/dsv4_fp4_mi355x_atom_mtp.sh starts atom.entrypoints.openai_server with --method mtp --num-speculative-tokens 3, FP8 KV cache, optional DP/EP parallel flags, then run_benchmark_serving with --dsv4 for DSv4-appropriate prompts.

Changelog: perf-changelog.yaml documents the new config key and image.

Reviewed by Cursor Bugbot for commit 04590ea. Bugbot is set up for automated code reviews on this repo. Configure here.

Add new benchmark config for DeepSeek-V4-Pro with MTP3 speculative
decoding on MI355X using ATOM. Uses image
rocm/atom-dev:nightly_202605301523 with --method mtp
--num-speculative-tokens 3. Concurrency range 4-256.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

4 similar comments
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comment thread benchmarks/single_node/dsv4_fp4_mi355x_atom_mtp.sh Outdated
Comment thread perf-changelog.yaml Outdated
- dsv4-fp4-mi355x-atom-mtp
description:
- "Add DeepSeek-V4-Pro FP4 MI355X ATOM MTP3 benchmark; image rocm/atom-dev:nightly_202605301523, concurrency 4-256"
pr-link: TBD
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The new perf-changelog.yaml entry has pr-link: TBD while every other recent entry uses a concrete https://github.com/SemiAnalysisAI/InferenceX/pull/N URL. This PR is #1627 — replacing TBD with https://github.com/SemiAnalysisAI/InferenceX/pull/1627 before merge keeps the changelog scannable. Nit, not a runtime issue.

Extended reasoning...

What the bug is

In perf-changelog.yaml, the new entry added by this PR (lines 3339-3343) for dsv4-fp4-mi355x-atom-mtp uses pr-link: TBD as a placeholder. Every other recent entry in the same file (e.g., lines 3313, 3319, 3325, 3331, 3337) carries a concrete GitHub URL of the form https://github.com/SemiAnalysisAI/InferenceX/pull/N.

Why this matters

The changelog's value comes from being able to quickly jump from a config-key change to the PR that introduced it. A TBD placeholder breaks that scannability convention: once merged, the link will remain dead and a reader looking up the provenance of the dsv4-fp4-mi355x-atom-mtp benchmark has to manually search PR history instead of clicking through. Multiplied across many such entries over time, this erodes the changelog's usefulness as a documentation artifact.

Why existing code/process didn't prevent it

There is no schema validation in the repo that enforces pr-link to be a URL — TBD is a free-form string and YAML parses it fine. The convention is documented only by example (the 70+ surrounding entries), so it relies on the PR author remembering to substitute the real number once the PR is opened.

Step-by-step proof

  1. Open the PR diff for perf-changelog.yaml — the new entry ends with pr-link: TBD at line 3343.
  2. The PR number for this change is 1627 (visible in the PR URL / metadata).
  3. Scroll up a few entries in the same file — e.g. line 3337 has pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1616, line 3331 has .../pull/1624, line 3325 has .../pull/1602, line 3319 has .../pull/1607. All concrete URLs.
  4. The new entry breaks that pattern.

Impact

None at runtime — perf-changelog.yaml is documentation only and isn't consumed by the benchmark/CI runner (only .github/configs/amd-master.yaml is). The impact is purely on changelog readability and convention consistency.

How to fix

Replace pr-link: TBD on line 3343 with:

  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1627

This is a one-line edit before merge.

@seungrokj seungrokj changed the title Add DeepSeek-V4-Pro FP4 MI355X ATOM MTP3 benchmark [AMD] Add DeepSeek-V4-Pro FP4 MI355X ATOM MTP3 benchmark May 31, 2026
…attn support

- Switch image from nightly_202605301523 to stable atom0.1.3
- Expand concurrency search space from 4-256 to 1-1024
- Refactor benchmark script to use PARALLEL_ARGS pattern (matching
  glm5/dsv4-atom scripts) for DP_ATTENTION + EP_SIZE combinations
- Extract SPEC_ARGS for MTP speculative decoding args
- Update perf-changelog image reference

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@seungrokj seungrokj added the AMD label May 31, 2026
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit c311908. Configure here.

- isl: 1024
osl: 1024
search-space:
- { tp: 8, ep: 1, conc-start: 1, conc-end: 1024, spec-decoding: mtp }
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concurrency sweep range mismatch

Medium Severity

The new dsv4-fp4-mi355x-atom-mtp search space uses conc-start: 1 and conc-end: 1024, while the PR describes a 4–256 concurrency range and peer MI355X ATOM MTP configs (e.g. qwen3.5-fp8-mi355x-atom-mtp) use conc-start: 4 and conc-end: 256. Sweeps will run extra points below 4 and above 256 that were not validated in the reported throughput table.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit c311908. Configure here.

else #DP+TP
PARALLEL_ARGS=(-tp "$TP" --enable-dp-attention )
fi
fi
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expert parallel only with DP

Low Severity

--enable-expert-parallel is only added when DP_ATTENTION is true and EP_SIZE is greater than 1. Repository guidance and dsv4_fp4_mi355x_atom.sh enable expert parallel whenever EP_SIZE exceeds 1, independent of data-parallel attention. A future search-space entry with ep greater than 1 and default dp-attn would not match the non-MTP ATOM script behavior.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit c311908. Configure here.

@github-actions
Copy link
Copy Markdown
Contributor

Copy link
Copy Markdown
Collaborator

@functionstackx functionstackx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

…late

DeepSeek-V4-Pro's tokenizer ships without a jinja chat_template, so
--use-chat-template makes benchmark_serving.py crash during request
sampling (ValueError: tokenizer.chat_template is not set), before any
traffic is sent. This failed the canary in run 26703437634.

--dsv4 routes prompts through the bundled encoding_dsv4.py encoder
(<bos><User>...<Assistant><think> framing) and auto-enables chat
templating internally, matching the dsv4 vLLM and SGLang MTP recipes.
@github-actions
Copy link
Copy Markdown
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

3 participants