POMDP: Partially Observable Markov Decision Process—a scenario where the agent cannot see the full state of the world and must make decisions based on incomplete observations and memory.
Memory rewriting: The process of selectively discarding outdated information from memory and replacing it with new, relevant information.
PPO: Proximal Policy Optimization—a popular reinforcement learning algorithm used for training the agents.
LSTM: Long Short-Term Memory—a type of recurrent neural network capable of learning long-term dependencies, equipped with gating mechanisms for forgetting.
GTrXL: Gated Transformer-XL—a transformer architecture adapted for RL that uses gating to stabilize training and memory caching for long contexts.
SHM: Stable Hadamard Memory—a structured memory architecture that uses matrix operations to store and retrieve information.
Endless T-Maze: A navigation task consisting of an infinite sequence of corridors where directional cues change at every junction, requiring continual memory updates.
Color-Cubes: A grid-world task where agents must find specific colored cubes that may teleport, requiring the agent to update its internal map of object locations.