← Back to Paper List

How to think step-by-step: A mechanistic understanding of chain-of-thought reasoning

Subhabrata Dutta, Joykirat Singh, Soumen Chakrabarti, Tanmoy Chakraborty
Trans. Mach. Learn. Res. (2024)
Reasoning Factuality QA

📝 Paper Summary

Mechanistic Interpretability Reasoning in Large Language Models
This mechanistic analysis of Llama-27B reveals a 'functional rift' around layer 16, where the model transitions from processing ontological knowledge (pretraining prior) to generating answers via parallel pathways (in-context prior).
Core Problem
While Chain-of-Thought (CoT) prompting significantly improves LLM reasoning, the internal neural mechanisms (circuits) that implement this ability remain largely unknown.
Why it matters:
  • Understanding internal mechanisms is necessary to explain why reasoning is often brittle against unrelated changes
  • Current literature observes CoT behavior via input/output perturbation but treats the underlying neural algorithms as a black box
  • Verifying whether models actually rely on their generated reasoning steps (causality) vs. just hallucinating explanations requires mechanistic evidence
Concrete Example: When given a prompt like 'Numpuses are rompuses... Max is a numpus...', we do not know if the model logically deduces 'Max is a rompus' using specific attention heads or just memorizes patterns. This paper investigates if specific heads copy the 'numpus' property to 'rompus' to enable the deduction.
Key Novelty
Functional Rift and Parallel Answer Pathways
  • Identifies a distinct phase shift in the model's middle layers (the 'functional rift'): early layers process static relationships (ontology) while later layers handle dynamic context and answer generation
  • Discovers that CoT reasoning is not a single serial process but involves multiple 'parallel pathways' where different attention heads simultaneously collect information to write the answer
Evaluation Highlights
  • Identified a functional rift at the 16th decoder block where token representations shift from pretraining priors to in-context priors
  • Localized attention heads responsible for ontological information transfer (moving properties between entities) primarily to the first 16 layers
  • Demonstrated that answer writing heads appear almost exclusively at or after the 16th decoder block, indicating a structured depth-wise specialization
Breakthrough Assessment
8/10
Provides a significant step forward in opening the 'black box' of CoT reasoning by mapping abstract reasoning steps to specific model layers and attention heads, moving beyond behavioral analysis.
×