← Back to Paper List

Stable Hadamard Memory: Revitalizing Memory-Augmented Agents for Reinforcement Learning

Hung Le, Kien Do, Dung Nguyen, Sunil Gupta, Svetha Venkatesh
Not reported in the paper
arXiv (2024)
Memory RL Agent

📝 Paper Summary

Memory-Augmented Neural Networks (MANNs) Reinforcement Learning in POMDPs
The paper introduces Stable Hadamard Memory, a matrix-based memory model for RL that uses dynamic element-wise calibration to selectively erase and strengthen information while ensuring bounded gradients.
Core Problem
Existing deep memory models (MANNs) struggle in partially observable RL environments because they fail to efficiently capture long-term information and suffer from numerical instability (gradient vanishing/exploding) when updating memory over long episodes.
Why it matters:
  • Agents in POMDPs (Partially Observable Markov Decision Processes) must store and update past information to make optimal decisions
  • Current methods like DNC or Transformers are either too unstable for RL or lack the flexibility to selectively forget and recall information based on evolving contexts
  • Simple vector baselines (GRU/LSTM) often outperform sophisticated MANNs due to these stability issues
Concrete Example: An agent navigating a room may need to remember a key's location, retain it during a detour, and recall it later. Existing models may fail to 'hold' this memory during the detour (forgetting) or fail to learn the association due to vanishing gradients over the long sequence of detour steps.
Key Novelty
Hadamard Memory Framework (HMF) with Stable Calibration
  • Replaces complex matrix operations with element-wise Hadamard products for memory writing, allowing specific memory cells to be calibrated (erased/strengthened) without mixing content
  • Introduces a dynamic calibration matrix tailored to be computationally efficient (parallelizable) while mathematically strictly bounding the expected value of memory products to prevent gradient explosion
Evaluation Highlights
  • Achieves O(log t) time complexity for processing sequences via parallel prefix scan implementation, compared to O(t H^2) for standard sequential matrix updates
  • Demonstrates superior performance (claimed) on challenging benchmarks like Meta-RL, long-horizon credit assignment, and POPGym compared to state-of-the-art memory models
Breakthrough Assessment
7/10
Provides a theoretically grounded unified framework for memory writing and addresses the critical stability issues of MANNs in RL. Theoretical parallelization speedup is significant.
×