Relation Also Knows: Rethinking the Recall and Editing of Factual Associations in Auto-Regressive Transformer Language Models

📝 Paper Summary

Knowledge Editing Model Interpretability

RETS improves knowledge editing by modifying MLP weights at the last relation token (where relational knowledge aggregates) rather than the subject token, using a constraint to prevent over-generalization to similar subjects.

Core Problem

Existing locate-then-edit methods (like ROME) focus only on subject tokens, ignoring relation information, which leads to over-generalization where unrelated attributes of the subject are incorrectly modified.

Why it matters:

Current editors damage model reliability by altering unrelated facts (e.g., changing a subject's 'wife' when editing their 'citizenship')
Subject-focused interpretations of transformer recall are incomplete, missing the crucial role of relation tokens in aggregating attribute information
Practical applications of Large Language Models require precise updates that fix specific errors without cascading side effects on the entity's other knowledge

Concrete Example: When ROME edits the fact <Marco Reus, citizen-of, Britain>, it incorrectly changes unrelated queries like 'Marco Reus's wife is' to Britain-related answers because it modifies the subject representation without considering the specific relation 'citizen-of'.

Key Novelty

Relation-focused Editing with Subject constraints (RETS)

Interprets knowledge recall by showing that relation-specific attributes aggregate at the *last relation token* in middle-late MLP layers, not just at the subject token
Shifts the editing target from the subject token (standard practice) to this relation-aggregation site to ensure the edit is specific to the relation
Applies an optimization constraint during the weight update to distinguish the target subject from 'neighborhood' subjects (same relation, different person), preventing the edit from bleeding into other entities

Architecture

Heatmaps of 'Indirect Effect of Relation' (IER) across layers and tokens for GPT2-XL.

Evaluation Highlights

Outperforms state-of-the-art locate-then-edit methods (ROME, MEMIT, PMET) by over +30% on the new R-Specificity metric measuring side effects on unrelated facts
Maintains competitive performance on standard efficacy and generalization metrics compared to baselines
Demonstrates that blocking MLP layers causes a sharper drop in attribute retrieval than blocking Attention layers, validating the MLP's role in relational knowledge storage

Breakthrough Assessment

7/10

Strong empirical evidence for a new 'relation-focused' mechanism of knowledge recall. The +30% gain in specificity is significant, though the method is currently limited to single-fact editing.

⚙️ Technical Details

Problem Definition

Setting: Single knowledge editing in auto-regressive transformer language models

Inputs: A factual association triplet <s, r, o> (subject, relation, object) and a target edit o*

Outputs: Modified model weights W* such that the model predicts o* for <s, r> while preserving other knowledge

Pipeline Flow

Relation-Focused Causal Tracing (Identify decisive layers/tokens)
Optimization with Subject Constraints (Compute weight update)

System Modules

Relation-Focused Causal Tracing

Identify the exact layer and token position where relational knowledge is aggregated

Model or implementation: GPT2-XL / GPT-J / Llama-2 (depending on experiment)

RETS Optimizer

Calculate the rank-one update for the MLP weights at the identified location

Model or implementation: Closed-form solution (similar to ROME but different target)

Novel Architectural Elements

Targeting the MLP at the *last relation token* (middle-late layers) instead of the last subject token (middle-early layers)
Inclusion of a 'Subject Constraint' optimization term to enforce distinguishability between the target subject and other subjects sharing the relation

Modeling

Base Model: GPT2-XL (1.5B), GPT-J (6B), Llama-2 (7B)

Training Method: Locate-then-edit (direct weight modification)

Objective Functions:

Purpose: Maximize probability of target object given subject+relation.

Formally: Standard ROME-style equality constraint k* = v*
Purpose: Differentiate target subject from others to prevent bleeding.

Formally: Optimization target ensuring hidden representations of neighborhood subject prompts remain distinct at the edit site

Key Hyperparameters:

attributes_rate_k: 50 (top-k tokens for attribute analysis)
neighborhood_sample_size: Not explicitly reported in the paper

Compute: Single GPU (inference-level compute for tracing and matrix update)

Comparison to Prior Work

vs. ROME: RETS edits at the last relation token (middle-late layers) instead of subject token (middle-early layers) and adds subject constraints.
vs. MEMIT: RETS is a single-edit method focusing on relation specificity, whereas MEMIT spreads updates for bulk editing.
vs. PMET: PMET still follows the subject-focused paradigm; RETS shifts to relation-focused editing to fix over-generalization.

Limitations

Current formulation is designed for single knowledge editing, not batch editing (unlike MEMIT).
Requires identifying 'neighborhood subjects' for the constraint optimization, which implies data dependencies.
The paper focuses on factual associations (triplets) and may not generalize to other types of reasoning tasks.

Reproducibility

Code: https://github.com/sunshower-liu/RETS

Code is publicly available at https://github.com/sunshower-liu/RETS. The dataset used is COUNTERFACT, supplemented with a new R-Specificity metric. The paper lists specific layers used for editing in different models (e.g., layers 0-36 for aggregation analysis in GPT2-XL).

📊 Experiments & Results

Evaluation Setup

Knowledge editing on the COUNTERFACT dataset with a new metric added

Benchmarks:

COUNTERFACT (Modified) (Fact editing and retrieval)

Metrics:

R-Specificity (Relation Specificity - NEW)
Efficacy (ES)
Generalization (PS)
Locality (NS)
Statistical methodology: Not explicitly reported in the paper

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
Comparative performance on GPT2-XL (1.5B) showing RETS superior specificity.
COUNTERFACT (GPT2-XL)	R-Specificity	0.38	0.74	+0.36
COUNTERFACT (GPT2-XL)	Efficacy (ES)	1.00	0.99	-0.01
Validation of the relation-focused hypothesis via attributes rate analysis.
WikiData Analysis	Spearman Correlation	0.00	0.97	+0.97

Experiment Figures

Analysis of attribute rates and object rankings across layers.

Main Takeaways

RETS achieves a massive improvement (>30%) in R-Specificity compared to ROME, MEMIT, and PMET, effectively solving the over-generalization problem where editing one fact alters unrelated attributes.
Blocking MLP layers during inference causes a much larger drop in attribute retrieval rates than blocking MHSA layers, confirming MLPs are the primary storage for relational knowledge.
Relational knowledge aggregation happens gradually from early layers and peaks at the last relation token in middle-late layers, later than where previous methods (ROME) attempt to edit.
The method maintains high Efficacy and Generalization scores, proving that specificity does not come at the cost of editing success.

📚 Prerequisite Knowledge

Prerequisites

Transformer architecture (MLP vs. Attention layers)
Causal Tracing (identifying important hidden states)
Rank-One Model Editing (ROME)
Auto-regressive language modeling

Key Terms

MLP: Multilayer Perceptron—the feed-forward sublayers in a Transformer, hypothesized to store factual key-value memories

MHSA: Multi-Head Self-Attention—sublayers responsible for routing information between tokens

Causal Tracing: A technique to locate which model activations are decisive for a prediction by corrupting inputs and restoring specific internal states

Locate-then-edit: A paradigm of knowledge editing that first identifies specific weights responsible for a fact and then modifies them

ROME: Rank-One Model Editing—a baseline method that updates a specific MLP layer to insert a new key-value pair for a fact

Over-generalizing: A failure mode where editing a specific fact (e.g., citizenship) incorrectly changes unrelated facts about the same subject (e.g., spouse)

R-Specificity: Relation Specificity—a new metric introduced in this paper to measure whether editing a relation affects unrelated attributes of the same subject

Subject constraints: An optimization term added to RETS to ensure the edit applies only to the target subject and not other subjects with the same relation