← Back to Paper List

KDCM: Reducing Hallucination in LLMs through Explicit Reasoning Structures

Jinbo Hao, Kai Yang, Qingzhen Su, Yifan Li, Chao Jiang
School of Computer Engineering, Jiangsu Ocean University, School of Computer Science and Technology, Soochow University
arXiv (2026)
Factuality Reasoning KG QA

📝 Paper Summary

Prompt-Induced Hallucination Mitigation Chain-of-Thought Reasoning Knowledge Distillation
KDCM mitigates prompt-induced hallucinations by embedding executable code into the reasoning prompt to guide knowledge exploration and enforce structured intermediate steps.
Core Problem
Large language models frequently suffer from prompt-induced hallucinations, where ambiguous or misleading prompts cause models to generate fluent but factually incorrect reasoning traces.
Why it matters:
  • Reliance on internal probabilistic prediction leads to error accumulation across multi-step reasoning tasks
  • Existing retrieval or verification methods often fail to explicitly constrain the model's internal reasoning process itself
  • Hallucinations prevent reliable deployment in safety-critical domains like scientific research and clinical decision support
Concrete Example: When a model is given an underspecified prompt about a complex entity relationship, it might hallucinate a plausible-sounding connection. KDCM instead generates code to traverse a knowledge graph, validating the relationship step-by-step before answering.
Key Novelty
Code-Guided Knowledge Distillation Chain
  • Embeds a programmable module (executable code) within reasoning prompts to act as an explicit control signal for knowledge exploration
  • Reformulates input prompts into structured sub-problems and uses code to traverse external knowledge graphs, constraining intermediate reasoning steps
  • Integrates this structured guidance into a knowledge distillation framework to teach the model to self-correct and verify its own logic
Evaluation Highlights
  • +15.64% improvement in HIT@1 compared to baselines using GPT-4 and LLaMA-3.3
  • Scores exceeding 95% on HIT@1, HIT@3, and HIT@5 across several evaluation settings
  • Robust performance maintenance even when prompts are deliberately made underspecified or ambiguous
Breakthrough Assessment
7/10
Strong empirical results (>95% HIT scores) and a novel integration of executable code into the reasoning/distillation loop, though reliance on external structured knowledge may limit applicability in unstructured domains.
×