← Back to Paper List

Hierarchical Reasoning Model

Guan Wang, Jin Li, Yuhao Sun, Xing Chen, Chang-Le Liu, Yue Wu, Meng Lu, Sen Song, Yasin Abbasi-Yadkori
Sapient Intelligence, Singapore
arXiv.org (2025)
Reasoning RL Benchmark

📝 Paper Summary

Neurosymbolic reasoning Recurrent Neural Networks (RNNs) System 2 reasoning
HRM mimics the human brain's hierarchical processing by using coupled slow and fast recurrent modules to perform deep latent reasoning without chain-of-thought supervision or expensive backpropagation through time.
Core Problem
Standard LLMs rely on brittle Chain-of-Thought (CoT) prompting for reasoning, which is computationally shallow, token-inefficient, and dependent on extensive human supervision.
Why it matters:
  • Transformers have fixed depth (AC^0 or TC^0 complexity), preventing them from solving polynomial-time reasoning problems end-to-end without external scratchpads
  • Chain-of-Thought requires generating slow, expensive token sequences and massive training data, yet remains fragile to single-step errors
  • Naive recurrent models suffer from vanishing gradients and prohibitive memory costs (O(T)) due to Backpropagation Through Time (BPTT)
Concrete Example: In complex Sudoku puzzles (Sudoku-Extreme Full), state-of-the-art CoT methods fail completely (0% accuracy) because they cannot effectively search and backtrack, whereas HRM solves them with near-perfect accuracy using latent computation.
Key Novelty
Hierarchical Reasoning Model (HRM)
  • Couples a 'high-level' slow module (planning) with a 'low-level' fast module (execution); the high-level state updates only after the low-level module converges to a local equilibrium
  • Replaces Backpropagation Through Time (BPTT) with a memory-efficient O(1) one-step gradient approximation based on Deep Equilibrium Models theory
  • Incorporates an Adaptive Computational Time (ACT) mechanism where the model learns to pause and 'think' for variable durations based on problem complexity
Evaluation Highlights
  • Achieves 40.3% on the ARC-AGI benchmark with only 27M parameters and 1000 training examples, outperforming o3-mini-high (34.5%) and Claude 3.7 (21.2%)
  • Solves 'Sudoku-Extreme Full' with near-perfect accuracy (~98-99%) using only 1000 samples, while GPT-4o and o1-mini score ~0%
  • Achieves 100% accuracy on optimal pathfinding in 30x30 mazes where CoT-based baselines fail completely (0%)
Breakthrough Assessment
9/10
HRM demonstrates a radical efficiency jump: beating massive LLMs (Claude 3.7, o3-mini) on ARC-AGI with a tiny 27M parameter model trained on just 1k examples. It validates a non-Transformer, biologically plausible reasoning path.
×