Add OLMoE architecture adapter tests#1333
Conversation
Download-free unit suite for OlmoeArchitectureAdapter (50 tests, no real checkpoints), following the existing adapter-test pattern. Covers config defaults, QKVO weight conversions with GQA head counts, component mapping (including Q/K-norm and the MoE bridge), factory dispatch, numerical conversion round-trips, GQA forward hook shapes with Q/K-norm wired, setup_component_testing, prepare_model in-place-clamp patching, and architecture guards.
|
The failing check is the Othello_GPT notebook at Cell 4, the Running the exact CI command locally ( |
|
Thanks for taking this on @RecreationalMath! Your tests look great, I will merge them shortly. The Othello GPT notebook failure is a known issue, I reran the CI test and your code passed on a second run. Thanks for documenting the source of the problem, I'll see if we can't get that cleaned up. |
Description
Adds a unit test suite for
OlmoeArchitectureAdapterundertests/unit/model_bridge/supported_architectures/, following the existing adapter-test pattern (modeled on the qwen2 / qwen3_moe / mixtral suites). It needs no model downloads or real checkpoints: it uses tiny programmaticTransformerBridgeConfigobjects, plus small synthetic tensors and a fake attention module for the behavioral tests, so it runs on CPU in seconds.The suite (50 tests) covers:
final_rms=False, eager attention).n_key_value_headsfallback.gaterouter.select_architecture_adapter.n_headswhile K/V surfacen_key_value_heads.setup_component_testingrotary-embedding wiring, eager-attention forcing, and robustness on a minimal HF model.prepare_modelin-place-clamp patching: a no-op whenclip_qkvis unset, when set it wraps attention forward soclip_qkvis disabled during the call and restored afterwards, and it tolerates a model without layers.Contributes to #1302 (OLMoE checkbox).
Type of change
Checklist: