Skip to content

Add OLMoE architecture adapter tests#1333

Merged
jlarson4 merged 2 commits into
TransformerLensOrg:devfrom
RecreationalMath:olmoe-adapter-test
May 27, 2026
Merged

Add OLMoE architecture adapter tests#1333
jlarson4 merged 2 commits into
TransformerLensOrg:devfrom
RecreationalMath:olmoe-adapter-test

Conversation

@RecreationalMath
Copy link
Copy Markdown
Contributor

Description

Adds a unit test suite for OlmoeArchitectureAdapter under tests/unit/model_bridge/supported_architectures/, following the existing adapter-test pattern (modeled on the qwen2 / qwen3_moe / mixtral suites). It needs no model downloads or real checkpoints: it uses tiny programmatic TransformerBridgeConfig objects, plus small synthetic tensors and a fake attention module for the behavioral tests, so it runs on CPU in seconds.

The suite (50 tests) covers:

  • Adapter config defaults (RMSNorm, rotary, gated MoE MLP, final_rms=False, eager attention).
  • Weight conversions: the four QKVO weights (OLMoE has no projection biases), with GQA-aware head counts and the no-n_key_value_heads fallback.
  • Numerical round-trips: the rearrange conversions are actually run on synthetic HF-shaped weight tensors, asserting the split-head output shapes and lossless reversion.
  • Component-mapping structure, bridge types, and HF module paths, including the Q/K-norm RMSNorm submodules and the MoE bridge with its gate router.
  • Factory registration and dispatch via select_architecture_adapter.
  • GQA forward hook shapes: a fake attention module (carrying OLMoE's pre-reshape Q/K norms) wired into the bridge confirms Q surfaces n_heads while K/V surface n_key_value_heads.
  • setup_component_testing rotary-embedding wiring, eager-attention forcing, and robustness on a minimal HF model.
  • prepare_model in-place-clamp patching: a no-op when clip_qkv is unset, when set it wraps attention forward so clip_qkv is disabled during the call and restored afterwards, and it tolerates a model without layers.
  • Architecture guards against drift.

Contributes to #1302 (OLMoE checkbox).

Type of change

  • New feature (non-breaking change which adds functionality)

Checklist:

  • I have commented my code, particularly in hard-to-understand areas
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have not rewritten tests relating to key interfaces which would affect backward compatibility

Download-free unit suite for OlmoeArchitectureAdapter (50 tests, no real
checkpoints), following the existing adapter-test pattern. Covers config
defaults, QKVO weight conversions with GQA head counts, component mapping
(including Q/K-norm and the MoE bridge), factory dispatch, numerical
conversion round-trips, GQA forward hook shapes with Q/K-norm wired,
setup_component_testing, prepare_model in-place-clamp patching, and
architecture guards.
@RecreationalMath
Copy link
Copy Markdown
Contributor Author

The failing check is the Othello_GPT notebook at Cell 4, the import transformer_lens cell, which is the same cell flagged previously. The mechanism here is a bit different from the rate-limit flake noted there: nbval fails it with Unexpected output fields from running code: {'stderr'}. The cell's saved output is empty, but in the CI environment the import emits a warning on stderr, so nbval sees an extra output field and fails the cell.

Running the exact CI command locally (pytest --nbval-sanitize-with demos/doc_sanitize.cfg demos/Othello_GPT.ipynb) passes all 13 cells with the same package versions CI used, so the stderr only appears in the CI runner environment. This PR only adds a new file under tests/unit/, so it can't affect that cell.

@jlarson4
Copy link
Copy Markdown
Collaborator

Thanks for taking this on @RecreationalMath! Your tests look great, I will merge them shortly.

The Othello GPT notebook failure is a known issue, I reran the CI test and your code passed on a second run. Thanks for documenting the source of the problem, I'll see if we can't get that cleaned up.

@jlarson4 jlarson4 merged commit 64c6375 into TransformerLensOrg:dev May 27, 2026
47 of 48 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants