← Back to Paper List

Adaptive Loops and Memory in Transformers: Think Harder or Know More?

Markus Frey, Behzad Shomali, Ali Hamza Bashir, David Berghaus, Joachim Koehler, Mehdi Ali
Lamarr Institute for Machine Learning and Artificial Intelligence, Fraunhofer Institute for Intelligent Analysis and Information Systems, University of Bonn
arXiv (2026)
Memory Reasoning Pretraining Benchmark

πŸ“ Paper Summary

Memory internalization Implicit reasoning
Combining adaptive layer looping with learnable static memory banks allows transformers to dynamically balance algorithmic reasoning (thinking harder) and factual retrieval (knowing more), outperforming deeper baselines on math tasks.
Core Problem
Looped transformers improve reasoning efficiency by iterating over hidden states but lack the parameter capacity of deeper models, causing performance drops on knowledge-intensive tasks.
Why it matters:
  • Standard Chain-of-Thought (CoT) requires generating expensive intermediate tokens, motivating implicit reasoning within hidden states
  • Looping offers parameter efficiency but sacrifices the storage capacity typically found in the unique weights of deep networks
  • Current methods force a trade-off: choose looped models for logic/math or deep models for knowledge/commonsense, rather than excelling at both
Concrete Example: A looped model might solve a multi-step algebra problem efficiently by iterating, but fail a commonsense QA task because it lacks the unique parameters to store diverse world facts, unlike a standard 36-layer model.
Key Novelty
Adaptive Looped Transformer with Gated Memory Banks
  • Augments a looped transformer with learned static memory banks (local per-layer and global shared) that are retrieved via attention during loops
  • Uses an adaptive halting mechanism (PonderNet-style) to let each layer dynamically decide how many times to iterate its computation
  • Introduces input-dependent gating to blend retrieved memory with the residual stream, allowing the model to choose when to access memory versus just computing
Evaluation Highlights
  • Loop-3 model with memory improves Math BPB by 4.2% over the Loop-3 model without memory
  • Outperforms an Iso-FLOP baseline (with 3x the layers) on math benchmarks (1.687 BPB vs 1.801 BPB)
  • Memory banks recover ~2% accuracy on commonsense tasks compared to loop-only models, closing the capacity gap
Breakthrough Assessment
7/10
Provides clear evidence of layer specialization (early layers loop less, later layers loop more) and demonstrates that memory banks effectively mitigate the capacity bottleneck of looped transformers.
×