← Back to Paper List

Lost in the Middle at Birth: An Exact Theory of Transformer Position Bias

Borun D Chowdhury
arXiv (2026)
Memory Pretraining QA

📝 Paper Summary

Architectural Inductive Bias Position Bias in LLMs Transformer Theory
The U-shaped performance curve in LLMs is a geometric property of the causal decoder architecture with residual connections present at initialization, not merely a learned artifact or positional encoding side-effect.
Core Problem
LLMs exhibit a 'Lost in the Middle' U-shaped performance curve where retrieval is poor in the middle of the context, but the mechanistic cause is debated between learned artifacts and positional encoding decay.
Why it matters:
  • Engineering efforts to fix this often target symptoms (e.g., modifying RoPE) rather than the root cause
  • Current theories rely on circular logic using parameters from trained networks, failing to identify architectural priors
  • The performance degradation significantly limits the effective utility of long-context windows in modern LLMs
Concrete Example: In a multi-document QA task, a model retrieves information perfectly if the relevant document is at the start or end of the prompt, but fails if the exact same document is placed in the middle, regardless of content relevance.
Key Novelty
Geometric Birthright Theory of Position Bias
  • Models multi-layer causal attention as iterated powers of the Cesàro matrix to derive a closed-form influence density at initialization (Step 0)
  • Proves causal masking mathematically forces a logarithmic Primacy Tail (attention sinks), while residual connections force an isolated Recency Anchor (delta spike)
  • Demonstrates that the 'middle' is a factorial dead zone where gradient influence is diluted by causal mixing, creating a structural valley before training begins
Evaluation Highlights
  • Theoretical continuous density equations achieve Spearman correlation of 0.99 with the empirical Jacobian of untrained 24-layer Qwen2 models
  • Wasserstein distance between theoretical prediction and empirical initialization is 0.02, confirming the closed-form Cesàro operator captures discrete topology
  • Empirical Jacobian shape is identical (Spearman correlation 0.99) with or without RoPE at initialization, debunking RoPE as the root cause
Breakthrough Assessment
9/10
Provides the first exact, closed-form proof that 'Lost in the Middle' is an architectural initialization property, resolving conflicting theories and fundamentally shifting the understanding of transformer geometry.
×