NaP: Normalize-and-Project—the proposed method of combining layer normalization with periodic projection of weights to a fixed radius to maintain a constant effective learning rate.
Effective Learning Rate (ELR): The actual step size in function space for a scale-invariant network; for normalized layers, ELR scales inversely with the squared parameter norm.
Plasticity: The ability of a neural network to adapt to new data or tasks after being trained on previous data; loss of plasticity refers to the inability to learn new information.
Scale-invariance: A property of a function where scaling the parameters by a constant factor does not change the output (e.g., f(cθ, x) = f(θ, x)), commonly induced by normalization layers.
Rainbow: A state-of-the-art value-based reinforcement learning agent that combines several improvements to DQN (Deep Q-Network), such as distributional RL and multi-step learning.
ALE: Arcade Learning Environment—a benchmark suite of Atari 2600 games used to evaluate reinforcement learning agents.
Neural Tangent Kernel: A kernel function that describes the evolution of a neural network during training in the infinite-width limit, often used to analyze trainability.
C4: Colossal Clean Crawled Corpus—a massive dataset of web text used for training large language models.
Saturated units: Neurons (like ReLUs) that are stuck outputting zero or a constant value, preventing gradient flow.