← Back to Paper List

Dr3: Ask Large Language Models Not to Give Off-Topic Answers in Open Domain Multi-Hop Question Answering

Y Gao, Y Zhu, Y Cao, Y Zhou, Z Wu, Y Chen, S Wu…
Chinese Information Processing Laboratory, Institute of Software, Chinese Academy of Sciences
arXiv, 3/2024 (2024)
QA RAG Agent Reasoning

📝 Paper Summary

Open Domain Multi-Hop Question Answering (ODMHQA) Hallucination mitigation
Dr3 is a post-hoc mechanism that detects irrelevant answers in multi-hop QA using an LLM-based discriminator and iteratively corrects the reasoning chain via backtracking.
Core Problem
LLMs frequently generate off-topic answers (irrelevant to the question type) in open-domain multi-hop QA due to error propagation in reasoning, planning, and retrieval.
Why it matters:
  • Off-topic answers account for approximately 1/3 of incorrect answers in ODMHQA tasks, significantly degrading performance
  • Existing methods like ReAct intertwine reasoning and planning, making it difficult to isolate and fix the specific step (Decomposition, Sub-Question, or Composition) causing the drift
Concrete Example: For the question 'In which year was David Beckham's wife born?', an LLM might answer 'Barack Obama' (a name, not a year). Dr3 detects this type mismatch and backtracks to find the correct year.
Key Novelty
Discriminate-Re-Compose-Re-Solve-Re-Decompose (Dr3)
  • Discriminator: Uses the LLM itself to judge if a generated answer matches the expected semantic type of the question (e.g., Year vs. Person)
  • Corrector: A backtracking mechanism that systematically revises the solving history in reverse order (Composition → Sub-Question → Decomposition) until the answer is on-topic
Architecture
Architecture Figure Figure 4
The workflow of the Dr3 mechanism, including the Discriminator and the three-stage Corrector (Re-Compose, Re-Solve, Re-Decompose).
Evaluation Highlights
  • Reduces off-topic answers by nearly 13% compared to ReAct on HotpotQA and 2WikiMultiHopQA
  • Improves Exact Match (EM) by nearly 3% over the ReAct baseline on both datasets
  • Demonstrates that 62% of off-topic errors stem from sub-question steps (planning, passage retrieval, reasoning), which the Re-Solve module specifically targets
Breakthrough Assessment
7/10
Solid engineering contribution addressing a specific, prevalent error type (off-topic). The backtracking mechanism is effective, though it relies on heuristic iterative correction rather than a fundamental architectural shift.
×