← Back to Paper List

ResWM: Residual-Action World Model for Visual RL

Jseen Zhang, Gabriel Adineera, Jinzhou Tan, Jinoh Kim
University of California, San Diego, Texas A&M University-Commerce
arXiv (2026)
RL MM

📝 Paper Summary

Model-Based Reinforcement Learning (MBRL) Visual Control
ResWM stabilizes visual model-based RL by predicting incremental residual actions rather than absolute actions and conditioning latent dynamics on explicit observation differences.
Core Problem
Traditional world models condition latent dynamics on absolute actions, which ignores the smoothness of physical control, leads to high-variance policy learning, and causes oscillatory or erratic behavior in continuous control tasks.
Why it matters:
  • Erratic control signals increase mechanical wear and energy consumption in real-world robotics
  • High-variance action spaces make long-horizon planning inefficient and optimization unstable
  • Standard frame-stacking often fails to explicitly capture the temporal dynamics required for precise control adjustments
Concrete Example: In a robotic continuous control task, a standard policy might output wildly different absolute commands (e.g., +1.0 then -0.8) between frames to maintain position, causing 'chattering.' ResWM instead predicts small residuals (e.g., +0.05), inherently enforcing smooth motion.
Key Novelty
Residual-Action World Model (ResWM)
  • Reparameterizes the control variable from absolute actions to residual actions (incremental changes), embedding a strong temporal smoothness prior that simplifies the search space
  • Introduces an Observation Difference Encoder (ODL) that explicitly encodes the difference between adjacent frames, creating a dynamics-aware latent state that aligns with the residual action prediction
Evaluation Highlights
  • Outperforms Dreamer and TD-MPC on DeepMind Control Suite, achieving an average score of 925.0 compared to Dreamer's 820.5 (estimated from context of gains)
  • Achieves superior stability on the Quadruped Walk task with a score of 715 at 1M steps (compared to 690 for baselines)
  • Demonstrates 0.96 normalized mean score on Atari, surpassing recent efficient baselines like TACO (0.88 normalized)
Breakthrough Assessment
8/10
Simple yet highly effective reparameterization that addresses a fundamental inefficiency in continuous control MBRL. Strong empirical gains in stability and smoothness make it practically valuable for robotics.
×