โ† Back to Paper List

Towards Deployable RL -- What's Broken with RL Research and a Potential Fix

Shie Mannor, Aviv Tamar
Technion and Nvidia Research, Technion
arXiv (2023)
RL Benchmark

๐Ÿ“ Paper Summary

Reinforcement Learning Methodology Research Practice & Ethics
The authors argue that current RL research over-optimizes for arbitrary benchmarks and detached theory, proposing a shift toward 'deployable RL' grounded in real-world challenges and system life-cycle design.
Core Problem
RL research is stagnating due to an obsession with sample complexity on made-up benchmarks (Atari/MuJoCo) that ignore system-level engineering issues like stability, debugging, and integration.
Why it matters:
  • Current benchmarks like OpenAI Gym abstract away critical system-design issues (state/reward definition), widening the gap between academic success and real-world utility
  • Emphasis on sample complexity ignores that compute is often cheap relative to engineering effort in practice
  • Lack of experimental rigor and reporting on failure cases makes it impossible for industry to assess stability or development costs
Concrete Example: While deep RL solved Atari games in 2015, the simple 2D ProcGen Maze benchmark remains unsolved, and impressive singular results often hide instability that prevents industrial adoption.
Key Novelty
Shift from 'Generalist Agents' to 'Deployable RL' via Community Challenges
  • Replace algorithm-vs-algorithm benchmark comparisons with community-sponsored 'challenges'โ€”specific problems where solving the task matters more than the method used
  • Introduce 'contributed challenges' as a credit-worthy publication type, rewarding the creation of platforms and communities around real-world problems
  • Prioritize 'design-patterns oriented research' that addresses system life-cycle issues (testing, debugging, maintenance) over pure algorithmic performance
Evaluation Highlights
  • This is a position paper; it does not contain quantitative experimental results.
  • The paper qualitatively evaluates the state of the field, identifying 5 key broken practices: overfitting to benchmarks, wrong focus, detached theory, uneven playing grounds, and lack of rigor.
Breakthrough Assessment
8/10
A highly influential critique that accurately diagnoses the gap between academic RL and industrial application, proposing concrete structural changes to how the community values research.
×