[AMD] Add DeepSeek-V4-Pro FP4 MI355X ATOM MTP3 benchmark by seungrokj · Pull Request #1627 · SemiAnalysisAI/InferenceX

seungrokj · 2026-05-31T03:33:58Z

Summary

Add new benchmark config dsv4-fp4-mi355x-atom-mtp for DeepSeek-V4-Pro with MTP3 speculative decoding on MI355X using ATOM
Add new script benchmarks/single_node/dsv4_fp4_mi355x_atom_mtp.sh with --method mtp --num-speculative-tokens 3
Image: rocm/atom-dev:nightly_202605301523 (ATOM upstream run 26690241645, 2026-05-30)
Concurrency range: 4–256 for both ISL 1024 and 8192

Performance vs current InferenceX (dsv4-fp4-mi355x-atom, nightly_202605130853)

ISL	OSL	Conc	InferenceX (tok/s/GPU)	ATOM MTP3 (tok/s/GPU)	Δ%
1024	1024	4	43.91	95.00	+116.3%
1024	1024	8	82.81	178.42	+115.5%
1024	1024	16	146.53	254.04	+73.4%
1024	1024	32	240.89	421.48	+75.0%
1024	1024	64	389.30	609.97	+56.7%
1024	1024	128	601.21	837.01	+39.2%
1024	1024	256	880.78	1086.07	+23.3%
8192	1024	4	168.42	435.60	+158.6%
8192	1024	8	307.43	726.92	+136.5%
8192	1024	16	512.60	1055.77	+106.0%
8192	1024	32	814.94	1450.80	+78.0%
8192	1024	64	1162.87	1966.73	+69.1%
8192	1024	128	1469.89	2390.46	+62.6%
8192	1024	256	704.73	2662.51	+277.8%

Test plan

Verify dsv4_fp4_mi355x_atom_mtp.sh starts atom server with --method mtp --num-speculative-tokens 3
Confirm --use-chat-template is passed to the benchmark client
Confirm dsv4-fp4-mi355x-atom-mtp config picks up the new script via spec-decoding: mtp
Run benchmark at conc=32 to confirm throughput matches upstream numbers

🤖 Generated with Claude Code

Note

Low Risk
Benchmark-only additions (YAML config, shell script, changelog); no changes to auth, serving production paths, or shared runtime libraries beyond new optional sweep jobs.

Overview
Adds a new DeepSeek-V4-Pro FP4 MI355X ATOM MTP3 benchmark track alongside the existing non-MTP dsv4-fp4-mi355x-atom entry.

CI / matrix: New config key dsv4-fp4-mi355x-atom-mtp in .github/configs/amd-master.yaml — rocm/atom:rocm7.2.4_..._atom0.1.3, TP8, spec-decoding: mtp, concurrency 1–1024 for 1024/1024 and 8192/1024 fixed-seq-len sweeps.

Runner: New benchmarks/single_node/dsv4_fp4_mi355x_atom_mtp.sh starts atom.entrypoints.openai_server with --method mtp --num-speculative-tokens 3, FP8 KV cache, optional DP/EP parallel flags, then run_benchmark_serving with --dsv4 for DSv4-appropriate prompts.

Changelog: perf-changelog.yaml documents the new config key and image.

^{Reviewed by Cursor Bugbot for commit 04590ea. Bugbot is set up for automated code reviews on this repo. Configure here.}

Add new benchmark config for DeepSeek-V4-Pro with MTP3 speculative decoding on MI355X using ATOM. Uses image rocm/atom-dev:nightly_202605301523 with --method mtp --num-speculative-tokens 3. Concurrency range 4-256. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-05-31T03:34:05Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow