use MTP if model_dir and draft_model_dir are equal#424
use MTP if model_dir and draft_model_dir are equal#424suspicious-pineapple wants to merge 1 commit into
Conversation
this seems to be the minimal set of changes needed to make MTP work tested with <https://huggingface.co/turboderp/Qwen3.6-27B-MTP-exl3>
|
I can't get this to work yet. I'm on exllamav3 9c5009efaa2cda8ed341369123bb4acfe18ae300 AI generated report below: Bug Report: AttributeError during MTP Draft Model GenerationDescriptionWhen initiating a chat completion with Multi-Token Prediction (MTP) enabled via the ExLlamaV3 backend, the generation process crashes. The error indicates that a linear module's inner component ( Steps to Reproduce
Expected BehaviorThe model should successfully iterate through draft tokens using MTP and stream the completion without crashing. Actual BehaviorThe server raises an Error Log & Traceback AnalysisCritical Error: Call Stack Highlights:
Potential Causes
Environment
Note: This bug report was drafted with the assistance of AI based on the provided traceback log. |
Why should this feature be added?
this seems to be the minimal set of changes needed to make MTP work, on latest exl3 dev branch.
Examples
MTP is enabled if the main model is the same as the draft model. otherwise it behaves normally
..maybe this would more sanely be exposed as a config option?
Additional context
tested with https://huggingface.co/turboderp/Qwen3.6-27B-MTP-exl3 (gotta download the safetensors file and put it in the model dir, i assume it will be included by default in future quants, where supported)