← Back to Paper List

SimBa: Simplicity Bias for Scaling Up Parameters in Deep Reinforcement Learning

Hojoon Lee, Dongyoon Hwang, Donghu Kim, Hyunseung Kim, Jun Jet Tai, K. Subramanian, Peter R. Wurman, J. Choo, Peter Stone, Takuma Seno
Coventry University
International Conference on Learning Representations (2024)
RL Benchmark

📝 Paper Summary

Network Architecture for RL Scaling Laws in RL
SimBa is a neural network architecture that enables scaling up Deep Reinforcement Learning models to millions of parameters without overfitting by explicitly enforcing a bias toward simpler functions.
Core Problem
Increasing the size of neural networks in Deep Reinforcement Learning (RL) typically leads to performance degradation due to overfitting, unlike in Computer Vision or NLP where larger models generally perform better.
Why it matters:
  • Current RL methods fail to leverage the scaling laws that have driven breakthroughs in other fields (e.g., LLMs), limiting the complexity of behaviors agents can learn.
  • Standard large networks (MLPs) fit noise in the RL training data rather than generalizable patterns, causing training to collapse as parameter count increases.
  • Existing scaling attempts often rely on computationally expensive components (like spectral normalization) or complex training protocols, making them inefficient.
Concrete Example: When scaling a Soft Actor-Critic (SAC) agent from 0.1M to 17M parameters on the 'Humanoid' task, the standard MLP architecture's performance drops significantly. In contrast, SimBa's performance improves as the model size increases.
Key Novelty
Architectural induction of Simplicity Bias
  • Uses a specific arrangement of normalization and residual connections to ensure the network prefers 'simple' (low-frequency) functions at initialization.
  • Maintains a direct linear path from input to output, adding non-linearity only via residual blocks, which encourages the model to ignore noise and focus on dominant features.
  • Does not require new loss functions or training algorithms; it is a drop-in architectural replacement for standard MLPs.
Evaluation Highlights
  • SimBa integrated into SAC matches or surpasses state-of-the-art methods across 51 tasks in DMC, MyoSuite, and HumanoidBench.
  • Scaling parameters from 0.1M to 17M consistently improves performance with SimBa, whereas standard MLPs degrade.
  • Achieves these results without computationally intensive components like self-supervised objectives, planning, or replay ratio scaling.
Breakthrough Assessment
8/10
Significantly addresses the long-standing 'scaling problem' in RL where bigger networks hurt performance. Simple architectural fix with broad applicability across multiple RL algorithms.
×