← Back to Paper List

Semantic Energy: Detecting LLM Hallucination Beyond Entropy

Huan Ma, Jiadong Pan, Jing Liu, Yan Chen, Joey Tianyi Zhou, Guangyu Wang, Qinghua Hu, Hua Wu, Changqing Zhang, Haifeng Wang
Tianjin University, Baidu Inc., A*STAR Centre for Frontier AI Research (CFAR), Beijing University of Posts and Telecommunications, University of Chinese Academy of Sciences
arXiv (2025)
Factuality QA Benchmark

📝 Paper Summary

Uncertainty Estimation Hallucination Detection
Semantic Energy estimates LLM uncertainty by combining semantic clustering with Boltzmann energy derived from unnormalized logits, detecting hallucinations even when the model consistently repeats the same incorrect answer.
Core Problem
Existing methods like Semantic Entropy rely on normalized probabilities, which fail when an LLM confidently and consistently generates the same incorrect answer (low aleatoric uncertainty but high epistemic uncertainty).
Why it matters:
  • LLMs frequently 'hallucinate' by confidently stating falsehoods; detecting this requires distinguishing between 'consistent correct' and 'consistent incorrect' responses.
  • Probability-based metrics (entropy) drop magnitude information from logits, losing signals about the model's inherent training familiarity with the topic.
Concrete Example: If an LLM answers 'Paris' to 'Capital of France?' 5 times, and 'Mars' to 'Capital of UK?' 5 times, Semantic Entropy is 0 for both (consistent semantics). However, the model likely has lower raw logit values (higher energy) for the incorrect 'Mars' answer, which Semantic Energy detects.
Key Novelty
Energy-Based Semantic Confidence
  • Replaces probability-based entropy with energy values derived directly from unnormalized logits (penultimate layer outputs) to capture inherent model confidence.
  • Aggregates these energy scores across clusters of semantically equivalent responses, ensuring that semantic consistency is weighted by the model's raw confidence level.
Evaluation Highlights
  • +13% improvement in AUROC over Semantic Entropy for hallucination detection on specific failure cases where the baseline is confident but wrong.
  • Improves AUROC from 71.6% to 76.1% on the CSQA dataset using the Qwen3-8B model.
  • Outperforms Semantic Entropy by >5% AUROC on the TriviaQA dataset across multiple models (Qwen3-8B, ERNIE-21B-A3B).
Breakthrough Assessment
7/10
Significant improvement on a critical failure mode of previous uncertainty methods (consistent hallucinations). The method is theoretically grounded in thermodynamics and simple to implement.
×