Interlingua: A shared, language-independent semantic subspace within a multilingual model's weights where concepts align across languages
Subspace-projection: An unlearning technique that identifies the low-dimensional geometric direction of a specific task or fact in weight space and removes it via orthogonal projection
TOFU: Task of Fictional Unlearning—a benchmark dataset using synthetic authors and facts to test whether models can unlearn specific information without collateral damage
Gradient Ascent: A basic unlearning method that updates model weights to maximize the loss (likelihood of error) on the specific data to be forgotten
NPO: Negative Preference Optimization—an unlearning method that treats the forget data as a 'rejected' sample in a preference optimization framework to discourage its generation
Collateral degradation: Unintended damage to the model's general capabilities or unrelated knowledge during the unlearning process
SISA: Sharded, Isolated, Sliced, and Aggregated—an unlearning framework that retrains models on subsets of data to make forgetting easier (mentioned as a baseline type)
KL divergence: A statistical measure of how one probability distribution differs from a second, reference probability distribution