← Back to Paper List

Rq-rag: Learning to refine queries for retrieval augmented generation

CM Chan, C Xu, R Yuan, H Luo, W Xue, Y Guo…
Hong Kong University of Science and Technology, Hong Kong Polytechnic University, Massachusetts Institute of Technology
arXiv, 4/2024 (2024)
RAG Factuality QA

📝 Paper Summary

Modularized RAG pipeline Query rewriting / query generation
RQ-RAG enhances retrieval-augmented generation by training a 7B model to explicitly rewrite, decompose, or disambiguate queries before searching, selecting the optimal refinement strategy via tree decoding.
Core Problem
Standard RAG methods often fail on ambiguous or complex queries because they use the original query indiscriminately for retrieval, and existing datasets lack explicit training for query refinement strategies.
Why it matters:
  • Indiscriminate retrieval for simple queries (like greetings) adds noise and degrades response quality
  • Complex queries requiring multi-hop reasoning cannot be answered by a single search using the original text
  • Ambiguous user intents require clarification or disambiguation before retrieval to provide accurate answers
Concrete Example: For a complex query, simply searching with the original text often fails to retrieve adequate information. Instead, the model should break it down into sub-queries (e.g., 'What is the population of X?' then 'What is the population of Y?'), search for those components, and synthesize the answer.
Key Novelty
Learning to Refine Query (RQ-RAG)
  • Trains a single Llama-2-7B model to dynamically choose between rewriting, decomposing, or disambiguating a query (or skipping retrieval) using special control tokens
  • Constructs a training dataset by using ChatGPT to generate refined queries and—crucially—regenerating the target answers based on the actual retrieval results to ensure context alignment
  • Uses a tree-decoding strategy at inference time to explore different refinement paths, selecting the best one based on model perplexity or confidence
Evaluation Highlights
  • +1.9% average accuracy improvement over Self-RAG (previous SOTA) on three single-hop QA datasets (Arc-Challenge, PopQA, OpenbookQA) using a 7B model
  • Significantly outperforms baselines on multi-hop datasets; e.g., +4.3% EM on HotpotQA compared to Self-RAG
  • Demonstrates high potential upper bound: if the oracle best trajectory is selected, performance jumps significantly (e.g., up to 63.6% accuracy on Arc-Challenge vs 52.7% current)
Breakthrough Assessment
7/10
Strong methodological contribution in unifying different query refinement strategies (rewrite/decompose/disambiguate) into one model with a novel data construction pipeline. Improvements over Self-RAG are consistent.
×