MLP: Multilayer Perceptron—the feed-forward sublayers in a Transformer, hypothesized to store factual key-value memories
MHSA: Multi-Head Self-Attention—sublayers responsible for routing information between tokens
Causal Tracing: A technique to locate which model activations are decisive for a prediction by corrupting inputs and restoring specific internal states
Locate-then-edit: A paradigm of knowledge editing that first identifies specific weights responsible for a fact and then modifies them
ROME: Rank-One Model Editing—a baseline method that updates a specific MLP layer to insert a new key-value pair for a fact
Over-generalizing: A failure mode where editing a specific fact (e.g., citizenship) incorrectly changes unrelated facts about the same subject (e.g., spouse)
R-Specificity: Relation Specificity—a new metric introduced in this paper to measure whether editing a relation affects unrelated attributes of the same subject
Subject constraints: An optimization term added to RETS to ensure the edit applies only to the target subject and not other subjects with the same relation