← Back to Paper List

Modeling Uncertainty Trends for Timely Retrieval in Dynamic RAG

B Li, T Tian, Z Xu, H Cheng, S Zhang, W Ye
Peking University (implied by github handle pkuserc)
arXiv, 11/2025 (2025)
RAG QA Factuality Reasoning

📝 Paper Summary

Modularized RAG pipeline
ETC determines optimal retrieval timing by analyzing the first and second-order differences of token-level entropy sequences to detect emerging uncertainty trends before errors propagate.
Core Problem
Existing dynamic RAG methods trigger retrieval based on low token-level confidence (reactive), which often happens too late after the model has already hallucinated or deviated from the correct path.
Why it matters:
  • Delayed retrieval leads to error propagation where subsequent generation is conditioned on incorrect prefixes
  • Heuristic-based triggers (e.g., fixed intervals) are inefficient, causing redundant retrievals and increased latency
  • Tracking isolated confidence values misses the dynamic evolution of uncertainty that signals impending model failure
Concrete Example: In 2WikiMultihopQA, a model generating an answer about 'The Love Light' produces incorrect directors before the confidence drops enough to trigger standard baselines like DRAGIN. By the time retrieval happens, the generation is already factually incorrect.
Key Novelty
Entropy-Trend Constraint (ETC)
  • Models uncertainty dynamics using differential analysis: First difference tracks the direction of entropy change; Second difference captures the acceleration (rate of change), acting as a sensitive early warning signal
  • Introduces Dynamic Smoothing to weigh recent entropy shifts against historical expectations, filtering out noisy outliers to prevent unnecessary retrieval
Evaluation Highlights
  • +12.1% improvement on LLaMA2-7B compared to strongest baselines across six benchmarks
  • Reduces delayed retrieval ratio significantly (10% vs 33% for DRAGIN on 2WikiMultihopQA manual evaluation)
  • Achieves higher performance with fewer retrieval operations than dynamic baselines like FLARE and DRAGIN
Breakthrough Assessment
7/10
Simple yet effective training-free method that addresses a fundamental flaw in dynamic RAG (latency of intervention). Strong empirical results across diverse benchmarks.
×