← Back to Paper List

SR-RAG: Binding Selective Retrieval with Knowledge Verbalization

Unknown authors
University of California, Los Angeles
RAG QA RL

📝 Paper Summary

Modularized RAG pipeline
SR-RAG reformulates selective retrieval as a routing problem where an LLM dynamically chooses between retrieving external documents or verbalizing its own parametric knowledge before answering.
Core Problem
Existing selective retrieval methods skip retrieval by falling back to standard generation, failing to utilize the LLM's full potential to explicitly articulate (verbalize) its internal knowledge.
Why it matters:
  • Current fallbacks limit performance upper bounds when retrieval is abstained, as LLMs perform better when explicitly reasoning or reciting knowledge
  • Training labels for selective retrieval are often inaccurate because they underestimate the LLM's internal capabilities without explicit knowledge elicitation
  • Standard RAG systems suffer from high latency and distraction from low-quality retrieved documents
Concrete Example: For the query 'Who succeeded the first President of Namibia?', a standard LLM might fail directly. However, if prompted to verbalize its knowledge first ('Namibia has had four presidents...'), it can answer correctly without external retrieval. SR-RAG captures this capability to avoid unnecessary retrieval.
Key Novelty
Self-Routing Retrieval-Augmented Generation (SR-RAG)
  • Reformulate selective retrieval as a 'knowledge source selection' problem where the model chooses between external sources (Wikipedia) and internal sources (Self)
  • incorporate explicit 'knowledge verbalization' (generating background context from memory) when retrieval is skipped, rather than just generating the answer directly
  • Use a nearest-neighbor (kNN) inference policy to dynamically adjust the selection decision based on the hidden states of similar past queries
Architecture
Architecture Figure Figure 1
Overview of SR-RAG inference pipeline showing the decision branch between External Retrieval and Knowledge Verbalization
Evaluation Highlights
  • Outperforms vanilla selective retrieval by 7.9% (PopQA), 2.1% (TriviaQA), and 4.7% (PubHealth) while performing significantly fewer retrievals
  • Reduces retrieval volume by 26% to 40% compared to strong selective retrieval baselines while maintaining or improving accuracy
  • Achieves better accuracy-latency trade-offs: as verbalization increases, system latency decreases linearly while maintaining high accuracy
Breakthrough Assessment
7/10
Strong conceptual advance in treating 'self' as a distinct RAG source. Significant efficiency gains. However, reliant on existing verbalization techniques (GenRead) and standard dense retrieval.
×