← Back to Paper List

RePo: Language Models with Context Re-Positioning

H Li, T Zhao, R Sproat
arXiv, 12/2025 (2025)
Memory Pretraining Reasoning

📝 Paper Summary

Context Management Position Encoding
RePo replaces rigid linear position indices in LLMs with a learnable re-positioning mechanism that dynamically assigns token positions based on relevance, improving performance on noisy and long-context tasks.
Core Problem
Standard LLMs assign fixed linear or constant position indices to tokens, imposing a rigid structure that fails to reflect actual information relevance.
Why it matters:
  • Rigid structures increase extraneous cognitive load, wasting finite working memory (attention capacity) on organizing disordered information rather than reasoning
  • Tasks requiring long-range dependencies (e.g., needle-in-a-haystack) suffer because linear positioning forces locality bias, making distant but relevant tokens harder to attend to
  • Linear assignment treats all context as equally spaced, limiting the model's ability to group related information or ignore noise
Concrete Example: In a 'needle-in-a-haystack' task where a critical answer (needle) is buried far from the question (query) amidst irrelevant text, standard RoPE attention focuses on nearby tokens due to locality bias. RePo dynamically assigns the 'needle' a position closer to the 'query' in the embedding space, allowing the model to attend to it despite the long linear distance.
Key Novelty
Context Re-Positioning (RePo)
  • Introduces a lightweight, differentiable module that predicts a continuous position value for each token based on its content, rather than its sequence index
  • Optimizes these predicted positions end-to-end using differentiable position encodings (like RoPE), allowing the model to 'move' relevant tokens closer together in attention space
  • Inspired by Cognitive Load Theory, it treats position assignment as a way to reduce extraneous load by organizing context more efficiently for the attention mechanism
Evaluation Highlights
  • +11.04 points average improvement over RoPE on the RULER benchmark (noisy context) within training context length
  • Outperforms baselines by at least 13.25 EM points on QA and Needle-in-a-Haystack tasks when extending context to 16K tokens (4x training length)
  • +5.48 points average improvement on LongBench compared to baselines, demonstrating superior long-context generalization
Breakthrough Assessment
8/10
Offers a fundamental rethinking of position embeddings—from fixed indices to dynamic, content-aware values. Significant gains in noise robustness and long-context generalization without heavy architectural changes.
×