← Back to Paper List

RCOT: Detecting and Rectifying Factual Inconsistency in Reasoning by Reversing Chain-of-Thought

Tianci Xue, Ziqi Wang, Zhenhailong Wang, Chi Han, Pengfei Yu, Heng Ji
Department of Software, Nanjing University, Department of Computer Science, University of Illinois Urbana-Champaign
arXiv (2023)
Factuality Reasoning Benchmark

📝 Paper Summary

Hallucination suppression Verification
RCoT improves arithmetic reasoning by asking LLMs to reconstruct the original problem from their generated solution, comparing the reconstruction to the original to detect and correct factual errors.
Core Problem
Large Language Models often hallucinate conditions, overlook constraints, or misinterpret questions during arithmetic Chain-of-Thought reasoning, leading to incorrect answers despite plausible logic.
Why it matters:
  • Existing self-verification methods usually provide only coarse-grained feedback (e.g., 'answer is wrong') without explaining why, failing to guide specific corrections
  • Factual inconsistencies in reasoning steps render LLMs unreliable for complex problem-solving where precision is critical
  • LLMs struggle to maintain consistency between problem conditions and reasoning steps, often hallucinating numbers or constraints not present in the input
Concrete Example: In a problem stating a meeting is 'tomorrow, 10/16/1924' (so today is 10/15), ChatGPT overlooks 'tomorrow' and calculates based on 10/16 being today. A standard checker might just say 'wrong', but RCoT explicitly flags: 'You overlooked the condition that the meeting is tomorrow.'
Key Novelty
Reverse Chain-of-Thought (RCoT)
  • Ask the LLM to reconstruct the problem statement based solely on its generated solution
  • Decompose both the original and reconstructed problems into structured lists of conditions and compare them item-by-item
  • Generate fine-grained textual feedback identifying specific hallucinations or overlooked conditions to guide the LLM in revising its answer
Architecture
Architecture Figure Figure 4
The four-step RCoT framework: Reconstruction, Decomposition, Comparison, and Revision.
Evaluation Highlights
  • +4.1% accuracy gain on AQuA dataset (ChatGPT) compared to standard Chain-of-Thought
  • +5.0% accuracy gain on Date dataset (ChatGPT) compared to standard Chain-of-Thought
  • Outperforms Self-Consistency on GSM8K (82.0% vs 81.6%) using significantly fewer inference trials (1 vs 30)
Breakthrough Assessment
7/10
Novel approach to self-correction via problem reconstruction. Strong results on hard arithmetic tasks with lower compute than voting methods, but limited to arithmetic/logic domains so far.
×