← Back to Paper List

Learn When (not) to Trust Language Models: A Privacy-Centric Adaptive Model-Aware Approach

C Huang, R Wang, K Xie, T Yu, L Yao
University of New South Wales, University of California San Diego, Duke University, Georgia Institute of Technology, Adobe Research, CSIRO’s Data61
arXiv, 4/2024 (2024)
RAG QA Factuality

📝 Paper Summary

Modularized RAG pipeline
EI-ARAG predicts the necessity of retrieval by analyzing pre-trained token embeddings from the model's first contextualized layer, avoiding the need for external data access or extra inference calls.
Core Problem
Retrieving external information when an LLM is already knowledgeable about a query is inefficient and can degrade response quality due to noisy context.
Why it matters:
  • Previous heuristics rely on entity frequency in pre-training corpora, which requires access to proprietary training data and fails on non-entity-centric questions.
  • Prompting-based adaptive methods (asking the LLM 'do you need help?') double the inference cost and are often unreliable due to LLM overconfidence.
Concrete Example: For the question 'Who is the mother of Melissa Benn?', a prompting-based method (PARAG-TAARE) wrongly decides no retrieval is needed and hallucinates 'Hilary Mantel'. EI-ARAG detects the need for retrieval based on embeddings, retrieves the correct context, and answers 'Caroline Benn'.
Key Novelty
Embedding-Informed Adaptive Retrieval-Augmented Generation (EI-ARAG)
  • Leverages the hypothesis that pre-trained token embeddings intrinsically capture concept frequency and model knowledge confidence.
  • Uses a lightweight classifier on the first contextualized embedding layer to predict retrieval necessity, rather than prompting the full model.
  • Eliminates the need for accessing original pre-training data frequencies or performing dual inference passes.
Evaluation Highlights
  • +11.61% accuracy improvement over simple No Retrieval on PopQA using LLaMA 2 7B, while retrieving for only 57.89% of queries.
  • Outperforms prompting-based baseline PARAG-TAARE by +3.87% accuracy on PopQA while reducing retrieval frequency by ~37 percentage points.
  • Achieves inference latency of ~0.04s per decision vs. ~0.39s for prompting-based methods on LLaMA 2 7B.
Breakthrough Assessment
7/10
Offers a highly efficient alternative to prompting for adaptive RAG. While the accuracy gains are modest in some cases, the latency reduction and removal of dependency on pre-training data are significant practical contributions.
×