Dmqr-rag: Diverse multi-query rewriting forrag

📝 Paper Summary

Modularized RAG pipeline

DMQR-RAG improves retrieval by using diverse rewriting strategies (extraction, expansion, keyword, general) and an adaptive selector that chooses the best strategy mix for each query.

Core Problem

Single-query rewriting lacks diversity, and existing multi-query methods often produce near-identical rewrites that fail to retrieve distinct relevant documents for complex queries.

Why it matters:

User queries often contain noise or intent deviations that direct retrieval cannot handle effectively.
Static knowledge in LLMs leads to hallucinations, requiring reliable external retrieval.
Existing prompt-based rewriting methods are often limited to specific query types (e.g., multi-hop) and lack generalization for diverse real-world inputs.

Concrete Example: For the query 'Where are the authors of the Transformer paper currently working?' (multi-hop) vs. 'What is the citation count for the Transformer paper?' (general), a fixed rewriting strategy might fail on one. DMQR-RAG adaptively selects different strategies for each.

Key Novelty

Information-based Diverse Multi-Query Rewriting (DMQR)

Defines four distinct rewriting strategies based on information flow: General Denoising, Keyword Extraction, Pseudo-Answer Expansion (adding priors), and Core Content Extraction (reducing detail).
Uses an adaptive selector (LLM-based) to dynamically choose which of these strategies to apply for a given query, minimizing noise while maximizing retrieval coverage.

Architecture

Comparison of Traditional RAG, Query Rewriting, and DMQR-RAG workflows.

Evaluation Highlights

Achieves higher recall and retrieval performance compared to RAG-Fusion and single-query baselines.
Adaptive selection reduces the number of queries needed while maintaining or improving performance.
Validates effectiveness across both academic benchmarks and industry settings.

Breakthrough Assessment

6/10

Offers a sensible, structured approach to query rewriting with adaptive selection. While effective, it relies on prompting existing LLMs rather than a fundamental architectural shift.

⚙️ Technical Details

Problem Definition

Setting: Retrieval-Augmented Generation where a query q is rewritten into a set of queries q' to retrieve documents D for generating answer A.

Inputs: User query q

Outputs: Final generated response A based on retrieved documents

Pipeline Flow

Adaptive Strategy Selector (Input Query → Selected Strategies)
Rewriter (Input Query + Strategies → Multiple Rewritten Queries)
Retriever (Rewritten Queries → Document Sets)
Reranker (Document Sets → Top-K Documents)
Generator (Original Query + Top-K Documents → Final Answer)

System Modules

Adaptive Strategy Selector (Input Processing)

Selects the most suitable rewriting strategies (GQR, KWR, PAR, CCE) for the specific query

Model or implementation: LLM (Prompt-based with few-shot examples)

Multi-Query Rewriter (Input Processing)

Generates rewrites according to the selected strategies

Model or implementation: LLM (Prompt-based)

Retriever (Retrieval & Selection)

Retrieves documents for each query in the set q'

Model or implementation: Bing Search Engine (Black box)

Reranker (Retrieval & Selection)

Reranks the combined set of retrieved documents

Model or implementation: BAAI-BGE-reranker

Generator

Generates the final answer using the top documents

Model or implementation: LLM

Novel Architectural Elements

Integration of an adaptive selector module that dynamically chooses rewriting strategies based on query characteristics before the rewriting step
Distinct information-theoretic rewriting strategies (GQR, KWR, PAR, CCE) specifically designed for diversity rather than just paraphrase

Modeling

Base Model: Evaluated using various LLMs (specifics not detailed in snippet, implies generalizability)

Training Method: Prompt Engineering / In-Context Learning

Compute: Not reported in the paper

Comparison to Prior Work

vs. RAG-Fusion: DMQR-RAG uses semantically distinct strategies (keywords, pseudo-answers) rather than just variations of the original query, ensuring higher document diversity.
vs. Hyde: Hyde is a single strategy; DMQR-RAG incorporates pseudo-answers as just one of several selectable strategies.
vs. Training-based methods (RRR, RQ-RAG): DMQR-RAG is prompt-based and does not require expensive training or dataset construction.

Limitations

Relies on the capabilities of the underlying LLM; weak LLMs may fail to rewrite effectively.
Latency increases with multiple retrieval calls (though adaptive selection mitigates this).
The 'black box' retriever (Bing) makes direct comparison with dense retrieval benchmarks difficult without normalization.

Reproducibility

Prompt templates for rewriting strategies are mentioned to be in the Appendix (not provided in snippet). Code availability is not explicitly stated in the text provided.

📊 Experiments & Results

Evaluation Setup

RAG pipeline using Bing Search for retrieval and BGE-reranker for ranking.

Benchmarks:

Academic and Industry datasets (General Question Answering)

Metrics:

Recall
Relevance (implied by 'performance')
Statistical methodology: Not explicitly reported in the paper

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
The paper claims multi-query rewriting outperforms single-query, and DMQR outperforms RAG-Fusion, but specific numeric tables were not included in the provided text snippet.

Main Takeaways

Multi-query rewriting generally outperforms single-query rewriting by increasing document recall.
Information-based rewriting strategies (DMQR) surpass vanilla RAG-Fusion by ensuring retrieved documents are more diverse.
Adaptive strategy selection maintains high performance while reducing the total number of necessary rewrites compared to using all strategies blindly.

📚 Prerequisite Knowledge

Prerequisites

Understanding of RAG pipelines (Retrieval-Augmented Generation)
Familiarity with prompt engineering for LLMs
Basic knowledge of search relevance metrics (Recall, MRR)

Key Terms

RAG: Retrieval-Augmented Generation—AI systems that answer questions by first searching for relevant documents

RAG-Fusion: A method that generates multiple queries and uses Reciprocal Rank Fusion to re-rank retrieved documents

LLM: Large Language Model—a type of AI model trained on vast amounts of text data to understand and generate human language

General Query Rewriting (GQR): Refines the original query to remove noise and clarify intent

Keyword Rewriting (KWR): Extracts keywords (nouns, subjects) to align with search engine preferences

Pseudo-Answer Rewriting (PAR): Generates a hypothetical answer to the query to use for semantic retrieval

Core Content Extraction (CCE): Simplifies the query by removing superfluous details to focus on key information

Reciprocal Rank Fusion: An algorithm that combines multiple ranked lists into a single ranking