Hopping Too Late: Exploring the Limitations of Large Language Models on Multi-Hop Queries

📝 Paper Summary

Mechanistic Interpretability Multi-hop Reasoning Internal Representations

LLMs resolve the first hop of a multi-hop query in early layers but resolve the second hop in late layers; failures occur when the second hop happens too late for the necessary computation.

Core Problem

Large Language Models often fail to correctly answer multi-hop queries (e.g., 'The spouse of the performer of Imagine is') even when they possess the knowledge for each individual hop.

Why it matters:

Latent multi-hop reasoning is critical for compositional generalization, allowing models to answer complex questions without explicit training on every combination
Understanding internal mechanisms is essential for reliable model editing and debugging, as treating the model as a black box hinders targeted improvements
Current interpretability methods (like vocabulary projection) often fail to detect entity resolution in early layers, leading to an incomplete understanding of how reasoning fails

Concrete Example: For the query 'The spouse of the performer of Imagine is', the model must first identify 'John Lennon' (hop 1) and then 'Yoko Ono' (hop 2). If the model identifies Lennon too late in its layers, the remaining layers may lack the specific functionality (MLP/Attention) needed to extract Ono, causing the model to output an incorrect answer despite knowing both facts individually.

Key Novelty

Sequential Latent Reasoning Pathway & Back-Patching Analysis

Discovers that the 'bridge entity' (result of hop 1) is encoded in early layers at the first-hop token position, while the final answer (hop 2) appears only in late layers at the final token position
Proposes 'back-patching': manually moving the hidden representation of the bridge entity from a later layer to an earlier layer to give the model more 'depth' to compute the second hop

Architecture

A conceptual diagram of the multi-hop reasoning process and the back-patching intervention.

Evaluation Highlights

Up to 66% of initially incorrect multi-hop queries can be corrected by back-patching the bridge entity representation to an earlier layer
In 41%-78% of cases, the bridge entity is successfully resolved in the hidden representation of the first hop's end-token, even when the final answer is wrong
The second hop (final answer) is predominantly resolved by MLP sublayers in the upper layers of the model, specifically at the last token position

Breakthrough Assessment

7/10

Provides strong mechanistic evidence for why multi-hop reasoning fails (layer budget exhaustion) and introduces a novel intervention (back-patching) that significantly recovers performance without training.

⚙️ Technical Details

Problem Definition

Setting: Latent multi-hop reasoning where a query requires composing two facts: (e1, r1, e2) and (e2, r2, e3) to predict e3 given e1 and relations r1, r2.

Inputs: Natural language query representing a two-hop relation (e.g., 'The spouse of the performer of Imagine is')

Outputs: Target entity e3 (e.g., 'Yoko Ono')

Pipeline Flow

Input Processing (Tokenization of multi-hop query)
Early Layers (Resolution of Bridge Entity e2)
Information Propagation (Moving e2 info to last token)
Late Layers (Resolution of Target Entity e3 via MLP)

System Modules

Transformer Block (Early)

Extracts the bridge entity (e2) from the first part of the query

Model or implementation: LLaMA-2/3 or Pythia (various sizes)

Transformer Block (Late)

Uses the bridge entity info to extract the target entity (e3) at the final token position

Model or implementation: LLaMA-2/3 or Pythia (various sizes)

Novel Architectural Elements

Back-patching intervention mechanism: Loops the hidden state of a specific token from layer L back to layer L-k during inference to extend computational budget

Modeling

Base Model: Analyzed LLaMA-2 (7B, 13B), LLaMA-3 (8B, 70B), and Pythia (6.9B, 12B)

Comparison to Prior Work

vs. Vocabulary Projections: Patchscopes detects entities in early layers where direct projection fails [not cited in paper but implied comparison]
vs. Linear Probing: Patchscopes provides natural language descriptions rather than binary classifications, requiring no training
vs. Chain-of-Thought [not cited in paper]: Focuses on latent (internal) reasoning without generating intermediate text tokens

Limitations

Back-patching requires searching for optimal source and target layers, which varies by example
Analysis relies on the Patchscopes framework which is an approximation of the information content
Focuses only on two-hop queries; scalability to n-hop queries is not tested
Does not provide a method to automatically determine when back-patching is necessary during inference

Reproducibility

Code: https://github.com/edenbiran/HoppingTooLate

publicly available (https://github.com/edenbiran/HoppingTooLate). Dataset of 82,020 two-hop queries included. Code for Patchscopes and back-patching analysis provided.

📊 Experiments & Results

Evaluation Setup

Zero-shot evaluation of multi-hop queries on pre-trained LLMs. Comparison between 'Correct' (model gets 2-hop right) and 'Incorrect' (model gets 1-hop right, 2-hop wrong) subsets.

Benchmarks:

Custom Wikidata Multi-Hop Dataset (Multi-hop reasoning QA) [New]

Metrics:

Patchscopes Accuracy (Is the entity decoded?)
Back-patching Restoration Rate (percentage of incorrect queries fixed)
Statistical methodology: Not explicitly reported in the paper

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
Patchscopes reveals that the bridge entity (e2) is resolved early in the model's layers, while the target entity (e3) is resolved late.
Custom Wikidata Dataset	e2 Resolution Rate (Correct Subset)	Not applicable	78% (LLaMA-3-70B)	Not applicable
Custom Wikidata Dataset	e2 Resolution Rate (Incorrect Subset)	Not applicable	63% (LLaMA-3-70B)	Not applicable
Back-patching experiments demonstrate that 'hopping too late' is a primary cause of failure; giving the model more layers restores accuracy.
Custom Wikidata Dataset	Restoration Rate	0	66% (LLaMA-3-70B)	+66

Experiment Figures

Cumulative probability curves showing the layer at which entities (e2 and e3) are first resolved.

Heatmap of Patchscopes success rates for decoding e2 from t1 across different source and target layers.

Main Takeaways

The bridge entity (result of hop 1) is typically resolved in early layers at the first-hop token position.
The target entity (result of hop 2) is resolved in late layers at the final token position.
In incorrect cases, the first hop is often resolved successfully but at a later layer than in correct cases.
Back-patching (moving state from late to early layer) corrects up to 66% of failures, suggesting the failure is due to running out of layers (computational depth) rather than lack of knowledge.

📚 Prerequisite Knowledge

Prerequisites

Transformer architecture (Attention vs. MLP sublayers, residual streams)
Mechanistic Interpretability basics (probing hidden states)
Knowledge Graph concepts (triplets of entity-relation-entity)

Key Terms

bridge entity: The intermediate entity (e2) that connects the two hops (e.g., 'John Lennon' in the query about Imagine's performer's spouse)

Patchscopes: A framework that translates hidden representations into natural language descriptions by patching them into a separate prompt, used here to decode what entity is encoded in a vector

vocabulary projection: A method to interpret a hidden state by multiplying it with the output embedding matrix to see which token it most strongly predicts

back-patching: A proposed analysis method where a hidden state from a later layer is injected back into an earlier layer at the same position to simulate having more computation depth

latent reasoning: The internal process where a model computes intermediate steps (like finding the bridge entity) implicitly within its hidden states without outputting them as text

MLP sublayer: The Feed-Forward Network component of a Transformer block, often hypothesized to act as a key-value memory for factual knowledge

hop: A single step of reasoning in a knowledge graph, moving from one entity to another via a relation