Renmin University of China,
Southeast University,
Beijing Jiaotong University
arXiv, 11/2024
(2024)
RAGFactualityQA
📝 Paper Summary
Modularized RAG pipeline
DMQR-RAG improves retrieval by using diverse rewriting strategies (extraction, expansion, keyword, general) and an adaptive selector that chooses the best strategy mix for each query.
Core Problem
Single-query rewriting lacks diversity, and existing multi-query methods often produce near-identical rewrites that fail to retrieve distinct relevant documents for complex queries.
Why it matters:
User queries often contain noise or intent deviations that direct retrieval cannot handle effectively.
Static knowledge in LLMs leads to hallucinations, requiring reliable external retrieval.
Existing prompt-based rewriting methods are often limited to specific query types (e.g., multi-hop) and lack generalization for diverse real-world inputs.
Concrete Example:For the query 'Where are the authors of the Transformer paper currently working?' (multi-hop) vs. 'What is the citation count for the Transformer paper?' (general), a fixed rewriting strategy might fail on one. DMQR-RAG adaptively selects different strategies for each.
Key Novelty
Information-based Diverse Multi-Query Rewriting (DMQR)
Defines four distinct rewriting strategies based on information flow: General Denoising, Keyword Extraction, Pseudo-Answer Expansion (adding priors), and Core Content Extraction (reducing detail).
Uses an adaptive selector (LLM-based) to dynamically choose which of these strategies to apply for a given query, minimizing noise while maximizing retrieval coverage.
Architecture
Comparison of Traditional RAG, Query Rewriting, and DMQR-RAG workflows.
Evaluation Highlights
Achieves higher recall and retrieval performance compared to RAG-Fusion and single-query baselines.
Adaptive selection reduces the number of queries needed while maintaining or improving performance.
Validates effectiveness across both academic benchmarks and industry settings.
Breakthrough Assessment
6/10
Offers a sensible, structured approach to query rewriting with adaptive selection. While effective, it relies on prompting existing LLMs rather than a fundamental architectural shift.
⚙️ Technical Details
Problem Definition
Setting: Retrieval-Augmented Generation where a query q is rewritten into a set of queries q' to retrieve documents D for generating answer A.
Inputs: User query q
Outputs: Final generated response A based on retrieved documents
Generator (Original Query + Top-K Documents → Final Answer)
System Modules
Adaptive Strategy Selector (Input Processing)
Selects the most suitable rewriting strategies (GQR, KWR, PAR, CCE) for the specific query
Model or implementation: LLM (Prompt-based with few-shot examples)
Multi-Query Rewriter (Input Processing)
Generates rewrites according to the selected strategies
Model or implementation: LLM (Prompt-based)
Retriever (Retrieval & Selection)
Retrieves documents for each query in the set q'
Model or implementation: Bing Search Engine (Black box)
Reranker (Retrieval & Selection)
Reranks the combined set of retrieved documents
Model or implementation: BAAI-BGE-reranker
Generator
Generates the final answer using the top documents
Model or implementation: LLM
Novel Architectural Elements
Integration of an adaptive selector module that dynamically chooses rewriting strategies based on query characteristics before the rewriting step
Distinct information-theoretic rewriting strategies (GQR, KWR, PAR, CCE) specifically designed for diversity rather than just paraphrase
Modeling
Base Model: Evaluated using various LLMs (specifics not detailed in snippet, implies generalizability)
Training Method: Prompt Engineering / In-Context Learning
Compute: Not reported in the paper
Comparison to Prior Work
vs. RAG-Fusion: DMQR-RAG uses semantically distinct strategies (keywords, pseudo-answers) rather than just variations of the original query, ensuring higher document diversity.
vs. Hyde: Hyde is a single strategy; DMQR-RAG incorporates pseudo-answers as just one of several selectable strategies.
vs. Training-based methods (RRR, RQ-RAG): DMQR-RAG is prompt-based and does not require expensive training or dataset construction.
Limitations
Relies on the capabilities of the underlying LLM; weak LLMs may fail to rewrite effectively.
The 'black box' retriever (Bing) makes direct comparison with dense retrieval benchmarks difficult without normalization.
Reproducibility
Prompt templates for rewriting strategies are mentioned to be in the Appendix (not provided in snippet). Code availability is not explicitly stated in the text provided.
📊 Experiments & Results
Evaluation Setup
RAG pipeline using Bing Search for retrieval and BGE-reranker for ranking.
Benchmarks:
Academic and Industry datasets (General Question Answering)
Metrics:
Recall
Relevance (implied by 'performance')
Statistical methodology: Not explicitly reported in the paper
Key Results
Benchmark
Metric
Baseline
This Paper
Δ
The paper claims multi-query rewriting outperforms single-query, and DMQR outperforms RAG-Fusion, but specific numeric tables were not included in the provided text snippet.
Main Takeaways
Multi-query rewriting generally outperforms single-query rewriting by increasing document recall.
Information-based rewriting strategies (DMQR) surpass vanilla RAG-Fusion by ensuring retrieved documents are more diverse.
Adaptive strategy selection maintains high performance while reducing the total number of necessary rewrites compared to using all strategies blindly.
📚 Prerequisite Knowledge
Prerequisites
Understanding of RAG pipelines (Retrieval-Augmented Generation)
Familiarity with prompt engineering for LLMs
Basic knowledge of search relevance metrics (Recall, MRR)
Key Terms
RAG: Retrieval-Augmented Generation—AI systems that answer questions by first searching for relevant documents
RAG-Fusion: A method that generates multiple queries and uses Reciprocal Rank Fusion to re-rank retrieved documents
LLM: Large Language Model—a type of AI model trained on vast amounts of text data to understand and generate human language
General Query Rewriting (GQR): Refines the original query to remove noise and clarify intent
Keyword Rewriting (KWR): Extracts keywords (nouns, subjects) to align with search engine preferences
Pseudo-Answer Rewriting (PAR): Generates a hypothetical answer to the query to use for semantic retrieval
Core Content Extraction (CCE): Simplifies the query by removing superfluous details to focus on key information
Reciprocal Rank Fusion: An algorithm that combines multiple ranked lists into a single ranking