Does Time Have Its Place? Temporal Heads: Where Language Models Recall Time-specific Information

📝 Paper Summary

Mechanistic Interpretability Knowledge Representation in LLMs Temporal Reasoning

The paper identifies specific 'Temporal Heads' in LLMs that are exclusively responsible for processing time-dependent facts, showing that disabling them degrades temporal recall without affecting general capabilities.

Core Problem

LLMs struggle to accurately represent temporal knowledge—facts that change over time (e.g., 'President in 2004' vs 'President in 2008')—unlike static facts.

Why it matters:

Real-world facts like political terms or sports team rosters evolve, requiring models to track changes rather than memorizing a single static answer
Current understanding of how LLMs internally organize and recall this time-specific information is limited compared to static factual recall
Without locating where temporal processing happens, it is difficult to intervene or edit outdated temporal knowledge effectively

Concrete Example: When asking 'In 1999, [X] was a member of sports team', the model must retrieve the team relevant to that specific year. Existing analysis doesn't explain how the model distinguishes this from 'In 2004', potentially leading to temporal mismatches.

Key Novelty

Discovery of 'Temporal Heads' via Temporal Knowledge Circuits

Applies circuit analysis to identify specific attention heads (e.g., a15.h0 in Llama-2) that activate exclusively for time-conditioned queries
Demonstrates that these heads bind temporal conditions (years or text aliases) to subjects, distinct from heads used for static or common sense knowledge
Proposes 'Temporal Knowledge Editing' by manipulating the activations of these specific heads to correct or reinforce time-specific factual recall

Architecture

Conceptual diagram of Temporal Knowledge Circuits showing how specific heads (Temporal Heads) activate for time-specific queries versus time-invariant queries.

Evaluation Highlights

Ablating Temporal Heads in Llama-2 significantly increases probability of non-target (wrong year) answers, while time-invariant knowledge remains stable
Identified heads respond to both numeric years ('In 2004') and textual aliases ('In the year the Summer Olympics were held'), confirming semantic temporal encoding
Ablation minimally impacts general QA performance (TriviaQA, Math), with F1 score drops of less than 0.6

Breakthrough Assessment

7/10

Strong mechanistic interpretability contribution identifying specific components for temporal processing. The finding that these heads are 'exclusive' to temporal tasks is significant, though the scope is limited to a few models and simple fact retrieval.

⚙️ Technical Details

Problem Definition

Setting: Analyzing internal model activations to identify subgraphs (circuits) responsible for predicting time-specific object o_k given subject s, relation r, and time T_k.

Inputs: Prompt containing a temporal condition (e.g., 'In 2004') and a factual query

Outputs: Identification of critical attention heads and MLPs; modified logits after ablation or editing

Pipeline Flow

Input Prompt Processing (Time-conditioned)
Circuit Discovery (EAP-IG Pruning)
Temporal Head Identification
Ablation / Editing Inference

System Modules

Circuit Discovery

Identify the subgraph of model components responsible for temporal factual recall

Model or implementation: Llama-2-7b-chat-hf / Qwen1.5-7B-Chat / Phi-3-mini-4k-instruct

Temporal Heads

Specific attention heads that bind time information to factual knowledge

Model or implementation: Specific heads like a15.h0, a18.h3 (Llama-2)

Novel Architectural Elements

Concept of 'Temporal Heads' as a distinct functional group within existing Transformer architectures (not a new architecture, but a new functional designation)

Modeling

Base Model: Llama-2-7b-chat-hf, Qwen1.5-7B-Chat, Phi-3-mini-4k-instruct

Compute: Not reported in the paper

Comparison to Prior Work

vs. General Knowledge Circuits (Yao et al., 2024): This paper extends the concept to 'Temporal Knowledge Circuits', specifically isolating time-dependent edges.
vs. Standard Causal Tracing: This paper focuses on 'heads' rather than just layers, providing finer-grained localization for temporal processing.

Limitations

Analysis is limited to 7B parameter models (Llama-2, Qwen1.5, Phi-3); scaling to larger models is unexplored.
Focuses on declarative facts (Subject-Relation-Object tuples); complex temporal reasoning (e.g., event ordering, duration) is not covered.
Threshold selection for circuit pruning involves hyperparameters that may affect which heads are deemed 'temporal'.
Evaluation relies heavily on probability shifts and exact match on specific datasets, which may not capture all nuances of generation.

Reproducibility

Code: https://github.com/dmis-lab/TemporalHead

Code and datasets are publicly available at https://github.com/dmis-lab/TemporalHead. The paper details the pruning method (EAP-IG), thresholds, and specific heads identified for Llama-2.

📊 Experiments & Results

Evaluation Setup

Zero-shot knowledge retrieval under temporal conditions.

Benchmarks:

Temporal Knowledge Dataset (Fact Retrieval (Year-specific)) [New]
Time-Invariant Knowledge Dataset (Commonsense/Static Facts) [New]
TriviaQA & Math ChroKnowledge (General QA)

Metrics:

Circuit Reproduction Score (CRS)
Log-Probability of Target vs Non-Target
Exact Match (EM) / F1 Score
Statistical methodology: Not explicitly reported in the paper

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
Circuit Reproduction Scores (CRS) indicate that the discovered circuits (subgraphs) effectively retain the full model's ability to recall knowledge, validating the circuit analysis method.
Temporal Knowledge Dataset	CRS	0	50	+50
Ablation studies show that disabling Temporal Heads harms temporal fact retrieval significantly more than static knowledge.
Temporal Knowledge Dataset	Target Probability Drop	0	-10	-10
General QA (TriviaQA)	F1 Score Drop	0	0.6	0.6

Experiment Figures

Visualization of attention heads in the extracted circuits, highlighting the overlap and exclusivity of heads for temporal vs invariant tasks.

Impact of ablating Temporal Heads on log-probabilities across different years.

Main Takeaways

Specific 'Temporal Heads' (e.g., a15.h0, a18.h3 in Llama-2) exist and are crucial for binding time conditions to facts.
These heads are specialized: ablating them degrades temporal recall but leaves common sense and general QA largely intact.
The heads activate for both explicit numeric years ('2004') and semantic aliases ('Year of Athens Olympics'), suggesting deep semantic temporal encoding.
Models exhibit 'Temporal Mismatch' when these heads are disabled, often retrieving facts from the wrong time period rather than talking nonsense.

📚 Prerequisite Knowledge

Prerequisites

Mechanistic Interpretability (Circuit Analysis)
Transformer Architecture (Attention Heads, MLPs, Residual Stream)
Integrated Gradients

Key Terms

Circuit Analysis: A technique to reverse-engineer neural networks by identifying a subgraph of nodes (heads/MLPs) and edges responsible for a specific behavior

Temporal Heads: Specific attention heads identified in this paper that are critically involved in processing time-dependent knowledge but inactive for static facts

EAP-IG: Effective Attribution Pruning with Integrated Gradients—a method to identify important edges in a computation graph by pruning those with low contribution to the output

Circuit Reproduction Score (CRS): A metric measuring how well a pruned circuit preserves the performance of the full model, scaled from 0 to 100

Knowledge Circuit: A subgraph of the model dedicated to storing and relaying factual content for a specific subject-relation-object triplet

Ablation: The process of zeroing out specific components (like attention heads) to measure their impact on model performance