Worldscape-MoE: A Unified Mixture-of-Experts World Model for Scalable Heterogeneous Action Control
Jianjie Fang1,*, Yongyan Xu1,*, Ziyou Wang1,*, Chen Gao1,*,†, Yuchao Huang1, Zhaolu Wang1, Rongze Tang1, Mingyuan Jia1, Baining Zhao1, Weichen Zhang1, Xin Zhang2, Haisheng Su2, Yu Shang1, Wei Wu2, Xinlei Chen1, Yong Li1,†
1Tsinghua University 2Manifold AI
*Equal contribution. †Corresponding authors.
Code will be released on July 20, 2026. Please stay tuned.
Worldscape-MoE is a Mixture-of-Experts world model for scalable heterogeneous action control. It unifies multiple control interfaces, including camera trajectories, robot actions, and hand-joint signals, within a shared world-dynamics backbone instead of treating each control modality as an isolated modeling problem.
Built on Diffusion Transformers, Worldscape-MoE combines modality-aware control injection, shared and control-specific experts, and a progressive MoE tuning strategy to absorb heterogeneous action supervision while preserving a shared model of world dynamics. Across locomotion, robotic manipulation, and egocentric hand control, it shows that heterogeneous supervision can improve rather than interfere with individual control capabilities, while supporting out-of-distribution generalization and continual extension to new action modalities.
Watch the full demo on YouTube
We are preparing the codebase for public release. The training and inference code will be open-sourced on July 20, 2026.
If you find this project useful, please consider citing:
@misc{fang2026worldscapemoe,
title = {Worldscape-MoE: A Unified Mixture-of-Experts World Model for Scalable Heterogeneous Action Control},
author = {Fang, Jianjie and Xu, Yongyan and Wang, Ziyou and Gao, Chen and Huang, Yuchao and Wang, Zhaolu and Tang, Rongze and Jia, Mingyuan and Zhao, Baining and Zhang, Weichen and Zhang, Xin and Su, Haisheng and Shang, Yu and Wu, Wei and Chen, Xinlei and Li, Yong},
year = {2026},
note = {Project page: https://worldscape-moe.com}
}