← Back to Paper List

LLM-CAS: Dynamic Neuron Perturbation for Real-Time Hallucination Correction

Jensen Zhang, Ningyuan Liu, Yijia Fan, Zihao Huang, Qinglin Zeng, Kaitong Cai, Jian Wang, Keze Wang
Sun Yat-sen University
arXiv.org (2025)
Factuality RL

📝 Paper Summary

Hallucination suppression Inference-time intervention
LLM-CAS trains a hierarchical reinforcement learning agent to dynamically select and apply temporary perturbations to specific neuron activations during inference to correct hallucinations without permanent parameter changes.
Core Problem
Static model editing methods struggle with context-dependent hallucinations and often damage unrelated knowledge (catastrophic forgetting), while existing dynamic methods rely on heuristic rules lacking adaptability.
Why it matters:
  • Hallucinations prevent reliable deployment of LLMs in mission-critical applications where factual accuracy is paramount
  • Retraining or full-model fine-tuning (RLHF/SFT) is computationally expensive and data-intensive
  • Permanent parameter edits are brittle; correcting one fact often breaks others or degrades general capabilities
Concrete Example: When an LLM is asked a question that elicits a hallucination (e.g., a wrong date for an event), static editing permanently changes weights to fix this but might break answers for similar but distinct events. LLM-CAS instead applies a temporary 'patch' to neuron activations only for that specific query context.
Key Novelty
Hierarchical Reinforcement Learning for Real-Time Neuron Perturbation
  • Frames hallucination correction as a hierarchical decision process: a high-level policy selects a functional network region, and a low-level policy determines the specific perturbation type and magnitude
  • Utilizes a 'learnable dynamic mask' combined with input-specific causal tracing to pinpoint exactly which neurons to perturb for a given input, ensuring interventions are sparse and targeted
Evaluation Highlights
  • +10.98 percentage points accuracy improvement on StoryCloze compared to the base model
  • +2.71 percentage points accuracy improvement on TriviaQA compared to the base model
  • +2.06 percentage points improvement on TruthfulQA (MC1 score) compared to the base model
Breakthrough Assessment
7/10
Applies Hierarchical RL to inference-time intervention in a novel way, outperforming static editing and heuristic dynamic methods. The integration of causal tracing with learned policies is promising.
×