← Back to Paper List

Towards General-Purpose Model-Free Reinforcement Learning

Scott Fujimoto, P. D'Oro, Amy Zhang, Yuandong Tian, Michael Rabbat
Meta Fundamental AI Research
International Conference on Learning Representations (2025)
RL Benchmark

📝 Paper Summary

General-Purpose Reinforcement Learning Representation Learning for RL Model-Free RL with Model-Based Objectives
MR.Q achieves general-purpose performance across diverse RL benchmarks using a single set of hyperparameters by leveraging model-based auxiliary losses to learn representations for a lightweight model-free agent.
Core Problem
Deep RL algorithms are typically highly specialized, requiring distinct hyperparameters and architectures for different domains (e.g., discrete vs. continuous), while general model-based methods are computationally expensive and complex.
Why it matters:
  • Current 'general' agents like DreamerV3 require massive models and slow planning procedures, limiting real-time applicability
  • The fragmentation of RL algorithms (e.g., DQN for Atari, TD3 for control) hinders the development of universal decision-making systems
  • Practitioners must perform extensive, domain-specific tuning to get RL algorithms to work on new tasks
Concrete Example: Standard algorithms like Rainbow (Atari) and TD3 (MuJoCo) share almost no hyperparameter values (e.g., learning rates differ by orders of magnitude: 6.25e-5 vs 1e-3). MR.Q solves both benchmarks with a single configuration.
Key Novelty
Model-based Representations for Q-learning (MR.Q)
  • Decouples representation learning from policy learning: uses auxiliary model-based losses (predicting reward, dynamics, termination) to shape a shared latent embedding
  • Replaces the expensive planning of model-based RL with a standard, lightweight model-free critic and actor that operate on these pre-learned embeddings
  • Optimizes for an approximately linear relationship between the learned features and the value function, enabling stable learning across vastly different observation and action spaces
Evaluation Highlights
  • ~8x higher Evaluation FPS (1.9k) on Gym HalfCheetah compared to DreamerV3 (236) while maintaining competitive performance
  • Achieves strong performance on Atari with ~4.4M parameters, compared to 187.3M for DreamerV3 (a ~40x reduction in model size)
  • Outperforms state-of-the-art generalist baselines (TD-MPC2, DreamerV3) on both DeepMind Control (DMC) Proprioceptive and Visual benchmarks
Breakthrough Assessment
8/10
Successfully challenges the dominance of complex model-based methods for generalist RL. It demonstrates that lightweight model-free agents can be general-purpose if the representation is grounded in dynamics.
×