regurgitation: The phenomenon where a language model generates training data verbatim
DPO: Direct Preference Optimization—an algorithm that optimizes a language model to align with preferences without an explicit reward model
ROUGE-L: A metric measuring the longest common subsequence between two texts, used here to detect verbatim overlap
unlearning: Techniques designed to make a model 'forget' specific subsets of training data
Tulu: An instruction-tuned model family based on Llama, used here as a base for experiments
The Pile: A large-scale, diverse dataset often used for training LLMs
infinite-gram: A method/tool to compute n-gram overlap with extremely large corpora (like The Pile) to measure memorization
system prompt: A high-level instruction given to the model (e.g., 'You are a helpful assistant') that governs its behavior for the interaction