Evaluation Setup
Qualitative analysis of the RL research landscape
Metrics:
- Statistical methodology: Not explicitly reported in the paper
Main Takeaways
- Current RL research overfits to benchmarks (Atari/MuJoCo) that do not correlate with real-world value
- Focus on sample complexity is misplaced; engineering effort and data acquisition costs often matter more
- Theory is often detached, focusing on pessimism (regret) or irrelevant models (small finite states) rather than explaining observed phenomena
- Experimental rigor is lacking; failure cases are hidden, and 'weight class' (compute resources) is rarely reported, confounding results
- To fix this, the field should reward 'Contributed Challenges' and 'Design Patterns' that address system life-cycle issues