← Back to Paper List

Reinforced Self-Training (ReST) for language modeling

Nicolas Zucchet, Jörg Bornschein, Stephanie Chan, Andrew Lampinen, Razvan Pascanu, Soham De
ETH Zürich, Google DeepMind
arXiv, 3/2025 (2023)
Factuality Pretraining Memory

📝 Paper Summary

Knowledge internalization Pre-training dynamics
Language models acquire factual knowledge in three distinct phases—statistics, plateau, and specific facts—driven by the formation of attention-based recall circuits during the plateau.
Core Problem
The mechanisms governing how large language models move from general language understanding to precise factual recall during pre-training remain poorly understood.
Why it matters:
  • Understanding knowledge compression is crucial as LLMs become primary gateways to human knowledge
  • Data distribution dependencies for efficient training are not well characterized
  • Distinguishing between flexible knowledge and rigid memorization is essential for preventing data leakage
Concrete Example: A model might know 'Paris is the capital of France' (knowledge) versus memorizing the exact sentence 'Paris is the capital of France'. Current models struggle to integrate new individuals via fine-tuning without rapidly corrupting existing memories.
Key Novelty
Three-phase Knowledge Acquisition Dynamics
  • Identifies a 'plateau' phase where performance stalls while the model builds internal attention circuits needed to route information for factual recall
  • Demonstrates that imbalanced data distributions shorten this plateau but slow down final knowledge acquisition, suggesting a dynamic curriculum could optimize training
  • Shows that 'hallucinations' (overconfident wrong answers) emerge simultaneously with genuine knowledge acquisition
Evaluation Highlights
  • Replacing attention patterns with late-training checkpoints eliminates the plateau phase entirely, proving the plateau is caused by circuit formation
  • Plateau length grows almost linearly with the number of individuals in the dataset (population size N)
  • Fine-tuning fails to add new knowledge effectively: models hallucinate immediately and existing memories in feed-forward layers are rapidly corrupted
Breakthrough Assessment
8/10
Provides fundamental mechanistic insights into LLM training dynamics (the three phases and the role of attention circuits) and proposes actionable data scheduling strategies.
×