Evaluation Setup
Reinforcement Learning in Partially Observable Environments
Benchmarks:
- POPGym (Partially Observable benchmarks (DeepMind))
- Meta-Reinforcement Learning (Maze navigation with varying layouts)
- Long-Horizon Credit Assignment (Sparse reward scenarios)
Metrics:
- Cumulative Reward
- Statistical methodology: Not explicitly reported in the paper
Key Results
| Benchmark |
Metric |
Baseline |
This Paper |
Δ |
| Theoretical analysis of computational complexity demonstrates the efficiency of the proposed framework compared to standard sequential methods. |
| Complexity Analysis |
Time Complexity |
O(t * H^2) |
O(log t) |
Exponential speedup (in time dimension)
|
Main Takeaways
- The paper theoretically proves that standard MANNs suffer from gradient instability when calibration (forgetting) is applied naively.
- The proposed Stable Hadamard Memory enables O(log t) parallel training, significantly faster than recursive memory models.
- Qualitative claims suggest the model outperforms baselines in tasks requiring selective retention and forgetting (e.g., remembering a key location while ignoring detour steps).