← Back to Paper List

REAP: Enhancing RAG with Recursive Evaluation and Adaptive Planning for Multi-Hop Question Answering

Y Zhu, H Zhou, W Hong, T Liu, N Wang
Harbin Institute of Technology, Tencent, University of Waterloo
arXiv, 11/2025 (2025)
RAG Reasoning Agent QA

📝 Paper Summary

Agentic RAG pipeline
REAP separates reasoning into a Sub-task Planner that dynamically updates a global plan and a Fact Extractor that retrieves and validates evidence, enabling error recovery in complex multi-hop queries.
Core Problem
Existing iterative RAG methods for multi-hop questions often get stuck in local reasoning impasses or fail to exploit latent clues because they lack global planning and dynamic error recovery.
Why it matters:
  • Incremental decomposition of complex queries is brittle; one failed step can derail the entire reasoning chain without a mechanism to recover
  • Current models often extract direct answers while ignoring latent clues necessary for subsequent steps, leading to incomplete evidence
  • Search-based methods like MCTS offer planning but suffer from high computational overhead, making them inefficient for real-time applications
Concrete Example: If a system needs to find 'the director of the movie starring X', it might first search for 'movies starring X'. If the retrieval returns a list but misses the specific movie intended by the context, a standard chain-of-thought method fails. REAP's planner would detect the insufficient fact, diagnose the failure, and trigger a 'Re-Planner' module to reformulate the search query or prune the invalid branch.
Key Novelty
Recursive Evaluation and Adaptive Planning (REAP)
  • Explicitly decouples 'Planning' (Sub-task Planner) from 'Execution' (Fact Extractor) into two distinct modules that operate in a recursive loop
  • Introduces a 'Re-Planner' sub-module that activates only when reasoning fails, performing pragmatic sufficiency checks (is this partial info enough?) or scoped plan repair (rewriting queries/pruning branches)
  • Uses a unified multi-task fine-tuning paradigm to transfer knowledge from data-rich routine planning tasks to data-scarce critical replanning scenarios
Architecture
Architecture Figure Figure 1
The REAP framework architecture, illustrating the recursive loop between the Sub-task Planner (SP) and Fact Extractor (FE).
Evaluation Highlights
  • Outperforms state-of-the-art method R1-Searcher by +4.6% F1 on HotpotQA and +10.2% F1 on 2WikiMultihopQA
  • Achieves superior generalization on out-of-domain datasets (MuSiQue, Bamboogle) despite being trained only on HotpotQA and 2WikiMultihopQA
  • Surpasses Fine-Tuned Standard RAG by +6.8% F1 on HotpotQA, proving gains stem from the iterative architecture rather than just training data
Breakthrough Assessment
8/10
Strong performance gains and a logically sound architecture that addresses the brittleness of static chains-of-thought via explicit replanning. The unified training strategy for scarce failure cases is a smart, practical contribution.
×