📝 Publications
Selected Publications

[Model Architecture] Routing Matters in MoE: Scaling Diffusion Transformers with Explicit Routing Guidance
Yujie Wei, Shiwei Zhang, Hangjie Yuan, Yujin Han, Zhekai Chen, Jiayu Wang, Difan Zou, Xihui Liu, Yingya Zhang, Yu Liu, Hongming Shan
- ProMoE is an MoE framework featuring a two-step router with explicit routing guidance that promotes expert specialization.

[Video Generation] DreamRelation: Relation-Centric Video Customization
Yujie Wei, Shiwei Zhang, Hangjie Yuan, Biao Gong, Longxiang Tang, Xiang Wang, Haonan Qiu, Hengjia Li, Shuai Tan, Yingya Zhang, Hongming Shan
- DreamRelation is the first relational video customization method that personalizes user-specified relations.

[Video Generation] DreamVideo: Composing Your Dream Videos with Customized Subject and Motion
Yujie Wei, Shiwei Zhang, Zhiwu Qing, Hangjie Yuan, Zhiheng Liu, Yu Liu, Yingya Zhang, Jingren Zhou, Hongming Shan
- DreamVideo is the first method that generates customized videos from a few static images of the desired subject and a few videos of target motion.

[Continual Learning] Online Prototype Learning for Online Continual Learning
Yujie Wei, Jiaxin Ye, Zhizhong Huang, Junping Zhang, Hongming Shan
- OnPro is the first work to identify shortcut learning as the key limiting factor for online continual learning, offering new insights into why online learning models fail to generalize well.

[Video Generation] DreamVideo-Omni: Omni-Motion Controlled Multi-Subject Video Customization with Latent Identity Reinforcement Learning
Yujie Wei, Xinyu Liu, Shiwei Zhang, Hangjie Yuan, Jinbo Xing, Zhekai Chen, Xiang Wang, Haonan Qiu, Rui Zhao, Yutong Feng, Ruihang Chu, Yingya Zhang, Yike Guo, Xihui Liu, Hongming Shan
- DreamVideo-Omni is a unified framework enabling harmonious multi-subject customization with omni-motion control via omni-motion and identity supervised finetuning as well as latent identity reward feedback learning.

[Video Generation] DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control
Yujie Wei, Shiwei Zhang, Hangjie Yuan, Xiang Wang, Haonan Qiu, Rui Zhao, Yutong Feng, Feng Liu, Zhizhong Huang, Jiaxin Ye, Yingya Zhang, Hongming Shan
- DreamVideo-2 is the first zero-shot (tuning-free) framework that generates customized videos with specified subjects and motion trajectories.
Collaborative Publications

FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion
Haonan Qiu, Shiwei Zhang, Yujie Wei, Ruihang Chu, Hangjie Yuan, Xiang Wang, Yingya Zhang, Ziwei Liu
- FreeScale proposes a tuning-free inference paradigm to enable higher-resolution visual generation via scale fusion.

Timestep Embedding Tells: It’s Time to Cache for Video Diffusion Model
Feng Liu, Shiwei Zhang, Xiaofeng Wang, Yujie Wei, Haonan Qiu, Yuzhong Zhao, Yingya Zhang, Qixiang Ye, Fang Wan
- TeaCache is a training-free caching approach that estimates and leverages the fluctuating differences among model outputs across timesteps.

EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models
Rui Zhao, Hangjie Yuan, Yujie Wei, Shiwei Zhang, Yuchao Gu, Lingmin Ran, Xiang Wang, Zhangjie Wu, Junhao Zhang, Yingya Zhang, Mike Zheng Shou
- EvolveDirector explores the feasibility of training a text-to-image generation model comparable to advanced models using publicly available resources.

InstructVideo: Instructing Video Diffusion Models with Human Feedback
Hangjie Yuan, Shiwei Zhang, Xiang Wang, Yujie Wei, Tao Feng, Yining Pan, Yingya Zhang, Ziwei Liu, Samuel Albanie, Dong Ni
- InstructVideo is the first research attempt that instructs video diffusion models with human feedback.

Hierarchical Spatio-Temporal Decoupling for Text-to-Video Generation
Zhiwu Qing, Shiwei Zhang, Jiayu Wang, Xiang Wang, Yujie Wei, Yingya Zhang, Changxin Gao, Nong Sang
- HiGen is a method that improves T2V performance by decoupling the spatial and temporal factors from the structure and content level.
-
CVPR 2026SynMotion: Semantic-Visual Adaptation for Motion Customized Video Generation, Shuai Tan, Biao Gong, Yujie Wei, Shiwei Zhang, Zhuoxin Liu, Dandan Zheng, Jingdong Chen, Yan Wang, Hao Ouyang, Kecheng Zheng, Yujun Shen. -
NeurIPS 2025 SpotlightRepLDM: Reprogramming Pretrained Latent Diffusion Models for High-Quality, High-Efficiency, High-Resolution Image Generation, Boyuan Cao, Jiaxin Ye, Yujie Wei, Hongming Shan. -
NeurIPS 2025TTS-VAR: A Test-Time Scaling Framework for Visual Auto-Regressive Generation, Zhekai Chen, Ruihang Chu, Yukang Chen, Shiwei Zhang, Yujie Wei, Yingya Zhang, Xihui Liu. -
ICCV 2025PersonalVideo: High ID-Fidelity Video Customization without Dynamic and Semantic Degradation, Hengjia Li, Haonan Qiu, Shiwei Zhang, Xiang Wang, Yujie Wei, Zekun Li, Yingya Zhang, Boxi Wu, Deng Cai. -
ACM MM 2023Emo-DNA: Emotion Decoupling and Alignment Learning for Cross-Corpus Speech Emotion Recognition, Jiaxin Ye, Yujie Wei, Xin-Cheng Wen, Chenglong Ma, Zhizhong Huang, Kunhong Liu, Hongming Shan. -
ICASSP 2023Temporal Modeling Matters: A Novel Temporal Emotional Modeling Approach for Speech Emotion Recognition, Jiaxin Ye, Xin-Cheng Wen, Yujie Wei, Yong Xu, Kunhong Liu, Hongming Shan.