← Back to Paper List

Enhancing retrieval and managing retrieval: A four-module synergy for improved quality and efficiency inragsystems

Y Shi, X Zi, Z Shi, H Zhang, Q Wu, M Xu
University of Technology Sydney
arXiv, 7/2024 (2024)
RAG Memory Factuality QA

📝 Paper Summary

Modularized RAG pipeline
ERM4 enhances RAG accuracy and efficiency by decomposing query rewriting into clarification and multi-query generation, filtering retrieved noise via natural language inference, and caching results to avoid redundant retrieval.
Core Problem
Standard RAG systems suffer from four key issues: information plateaus due to single-query limits, ambiguity in user questions, low precision of retrieved knowledge (noise), and inefficient redundant retrieval for similar queries.
Why it matters:
  • Single-query retrieval hits a ceiling (information plateau) where adding more documents doesn't help because the query scope is limited
  • Ambiguous user queries lead LLMs to generate vague or irrelevant answers
  • Retrieving irrelevant documents introduces noise that degrades generation quality
  • Repeatedly searching for the same or similar information wastes computational resources and increases latency
Concrete Example: In datasets like CAmbigNQ, a vague user question often prompts an LLM to list all possible interpretations rather than a specific answer. Additionally, preliminary studies show that even with 30 retrieved snippets, 'Snippet Precision' drops significantly, meaning most retrieved text is irrelevant noise that confuses the generator.
Key Novelty
Four-Module Synergistic RAG Enhancement (ERM4)
  • Query Rewriter+: Splits rewriting into two concurrent tasks: clarifying the intent of the original question and generating multiple diverse search queries to break information plateaus.
  • Knowledge Filter: Uses a Natural Language Inference (NLI) model to judge if retrieved text entails the answer, actively discarding irrelevant noise before generation.
  • Memory Knowledge Reservoir & Trigger: Caches effective knowledge pairs and uses a popularity-based calibration to decide when to fetch from cache vs. trigger a new external search.
Architecture
Architecture Figure Figure 1
The ERM4 framework workflow, illustrating the interaction between the User, Query Rewriter+, Search Engine, Knowledge Filter, Memory Knowledge Reservoir, and Retrieval Trigger.
Evaluation Highlights
  • Achieves 5%-10% increase in answer accuracy (Exact Match/F1) compared to direct inquiry across six QA datasets
  • Reduces response time by 46% for historically similar questions using the Memory Knowledge Reservoir without compromising quality
  • Query Rewriter+ and Knowledge Filter consistently improve performance over standard Rewrite-Retrieve-Read pipelines on PopQA, 2WikiMQA, and HotpotQA
Breakthrough Assessment
6/10
Solid engineering improvements to the RAG pipeline. The combination of multi-query generation, NLI-based filtering, and caching is practical and effective, though the individual components (rewriting, NLI filtering) are established concepts.
×