← Back to Paper List

Relation Also Knows: Rethinking the Recall and Editing of Factual Associations in Auto-Regressive Transformer Language Models

X Liu, Z Liu, N Gu, Z Lin, W Ma, J Xiang, W Wang
University of Science and Technology of China, Fudan University, Shanghai AI Laboratory
arXiv, 8/2024 (2024)
Factuality KG

📝 Paper Summary

Knowledge Editing Model Interpretability
RETS improves knowledge editing by modifying MLP weights at the last relation token (where relational knowledge aggregates) rather than the subject token, using a constraint to prevent over-generalization to similar subjects.
Core Problem
Existing locate-then-edit methods (like ROME) focus only on subject tokens, ignoring relation information, which leads to over-generalization where unrelated attributes of the subject are incorrectly modified.
Why it matters:
  • Current editors damage model reliability by altering unrelated facts (e.g., changing a subject's 'wife' when editing their 'citizenship')
  • Subject-focused interpretations of transformer recall are incomplete, missing the crucial role of relation tokens in aggregating attribute information
  • Practical applications of Large Language Models require precise updates that fix specific errors without cascading side effects on the entity's other knowledge
Concrete Example: When ROME edits the fact <Marco Reus, citizen-of, Britain>, it incorrectly changes unrelated queries like 'Marco Reus's wife is' to Britain-related answers because it modifies the subject representation without considering the specific relation 'citizen-of'.
Key Novelty
Relation-focused Editing with Subject constraints (RETS)
  • Interprets knowledge recall by showing that relation-specific attributes aggregate at the *last relation token* in middle-late MLP layers, not just at the subject token
  • Shifts the editing target from the subject token (standard practice) to this relation-aggregation site to ensure the edit is specific to the relation
  • Applies an optimization constraint during the weight update to distinguish the target subject from 'neighborhood' subjects (same relation, different person), preventing the edit from bleeding into other entities
Architecture
Architecture Figure Figure 2
Heatmaps of 'Indirect Effect of Relation' (IER) across layers and tokens for GPT2-XL.
Evaluation Highlights
  • Outperforms state-of-the-art locate-then-edit methods (ROME, MEMIT, PMET) by over +30% on the new R-Specificity metric measuring side effects on unrelated facts
  • Maintains competitive performance on standard efficacy and generalization metrics compared to baselines
  • Demonstrates that blocking MLP layers causes a sharper drop in attribute retrieval than blocking Attention layers, validating the MLP's role in relational knowledge storage
Breakthrough Assessment
7/10
Strong empirical evidence for a new 'relation-focused' mechanism of knowledge recall. The +30% gain in specificity is significant, though the method is currently limited to single-fact editing.
×