Matryoshka-style objective: A loss function where the update for a specific memory position is optimized against a set of nested targets (current window, current+next, current+next+next, etc.), enforcing long-range dependency
Working memory: The specific internal hidden state at a located layer and position that is modified to alter the model's output
Memory shift: The bias term (delta) added to a hidden state to effectively 'edit' the memory
Affinity: The cosine similarity between the gradients of the loss terms for different target figures, used to dynamically weight their importance
Locate-and-edit: A model editing paradigm that identifies specific neurons/layers responsible for a fact and updates them directly without full retraining
Window-by-Window strategy: A baseline approach that splits a long target into segments and updates memories for each segment sequentially and independently
One-for-All strategy: A baseline approach that tries to update a single memory to fix the entire long sequence at once