Evaluation Setup
Online Reinforcement Learning on standard benchmarks
Benchmarks:
- Arcade Learning Environment (Atari 2600) (Discrete control from pixels)
- Atari 100K (Sample-efficient discrete control)
- MuJoCo (Continuous control)
Metrics:
- Interquantile Mean (IQM) of scores
- Percentage of dormant neurons
- Overlap coefficient of dormant neuron sets
- Statistical methodology: 95% stratified bootstrap confidence intervals; IQM aggregation
Key Results
| Benchmark |
Metric |
Baseline |
This Paper |
Δ |
| ReDo enables effective training at higher replay ratios where standard DQN collapses. |
| Atari (17 games) |
IQM Score |
0.1 |
1.0 |
+0.9
|
| Atari 100K |
IQM Score |
0.8 |
1.1 |
+0.3
|
| DemonAttack (DQN) |
Dormant Neuron Fraction |
0.35 |
0.05 |
-0.30
|
Main Takeaways
- The dormant neuron phenomenon is driven by target non-stationarity, not input non-stationarity (confirmed by fixed-target experiments).
- Dormant neurons do not recover on their own; once inactive, they tend to stay inactive (high overlap coefficient).
- Simply pruning dormant neurons does not hurt performance, proving they are useless, but recycling them improves performance, proving they are potential capacity.
- Higher replay ratios accelerate the creation of dormant neurons, explaining the instability of high-RR training.
- ReDo allows for more aggressive updates (higher replay ratio) without the typical performance penalty.