[DeepSeek-V4] Implement model integration, decoders, and configuration stack#4153
Open
parambole wants to merge 1 commit into
Open
[DeepSeek-V4] Implement model integration, decoders, and configuration stack#4153parambole wants to merge 1 commit into
parambole wants to merge 1 commit into
Conversation
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
2a19018 to
23adce0
Compare
|
🤖 Hi @parambole, I've received your request, and I'm working on it now! You can track my progress in the logs for more details. |
|
🤖 I'm sorry @parambole, but I was unable to process your request. Please see the logs for more details. |
dipakg-lang
reviewed
Jun 12, 2026
This commit introduces full support for DeepSeek V4 by integrating its compressed attention mechanisms, MoE routing, and architectural layers. Key changes: - Add `deepseek4.yml` configuration and `DeepSeek4DecoderLayer` implementation. - Implement hybrid Hash Routing and Token Routing for MoE layers. - Add prefix/suffix layer unrolling for non-uniform compression blocks. - Fix Pydantic validation for base MLP dimensions. - Bypass MLA instantiation in favor of native CompressedAttention (CSA/HCA).
23adce0 to
6deaacc
Compare
entrpn
reviewed
Jun 12, 2026
entrpn
reviewed
Jun 12, 2026
entrpn
left a comment
Collaborator
There was a problem hiding this comment.
just one comment, everything else looks good.
Collaborator
|
Are you able to have a real run and check profile to see if the scan blocks order as expected? Compile test won't be able to verify a RunTime error. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR introduces native architectural and routing support for the DeepSeek V4 model in MaxText.
Why & What: DeepSeek V4 introduces non-uniform architectural features that require explicit configuration unrolling. This PR solves the integration by implementing:
[0, 0]prefix compression ratios, the perfectly alternating[4, 128]scanned middle layers, and the[4, 0]suffix layers.Tests
tests/unit/deepseek_v4_vs_reference_test.py.v5p-512mesh to guarantee memory constraints and HLO generation.Compile Command to Reproduce:
Proof of Compilation:
Checklist
Checklist
Before submitting this PR, please make sure (put X in square brackets):
gemini-reviewlabel.