MHQA: Multi-Hop Question Answering—Tasks requiring the integration of information scattered across multiple documents to answer a single complex query
RAG: Retrieval-Augmented Generation—AI systems that answer questions by first searching for relevant documents
Sub-task Planner (SP): The strategic module in REAP that maintains the global task plan, updates dependencies, and initiates replanning if steps fail
Fact Extractor (FE): The execution module in REAP that retrieves documents and extracts structured facts (statement + evidence + reasoning) from them
Re-Planner: A specialized sub-module of the SP invoked during failures to assess if partial info is sufficient or if the plan needs structural repair
F1 score: A metric balancing precision and recall, measuring the overlap between the predicted answer and the ground truth
FlashRAG: A Python toolkit for efficient RAG research, used here as the evaluation framework
CoRAG: A dataset/corpus based on English Wikipedia used for retrieval in these experiments
MCTS: Monte Carlo Tree Search—a heuristic search algorithm for decision processes, often used in complex reasoning planning
ACC†: Accuracy measured with an LLM serving as the judge