← Back to Paper List

Beyond Under-Alignment: Atomic Preference Enhanced Factuality Tuning for Large Language Models

Hongbang Yuan, Yubo Chen, Pengfei Cao, Zhuoran Jin, Kang Liu, Jun Zhao
The Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences, School of Artificial Intelligence, University of Chinese Academy of Sciences
arXiv (2024)
Factuality RL Benchmark

📝 Paper Summary

Factuality Alignment Hallucination Mitigation Preference Learning
APEFT improves the generalization of factual tuning by constructing fine-grained 'atomic' preference pairs—targeting individual facts rather than whole paragraphs—to address the under-alignment problem observed in out-of-domain settings.
Core Problem
Existing preference learning methods for factuality are primarily evaluated on in-domain data, but they fail to generalize to out-of-domain (OOD) queries, where performance often stagnates or decreases.
Why it matters:
  • Models tuned for factuality on one task (e.g., biographies) often fail to apply that factual behavior to other domains (e.g., general knowledge questions), limiting real-world utility
  • Standard paragraph-level feedback is too coarse, preventing models from learning specifically *which* facts are incorrect versus correct
  • Current methods exhibit 'under-alignment,' where the model's behavior barely changes on OOD inputs, rather than 'over-alignment' to spurious features
Concrete Example: A model trained to be factual on biographies (In-Domain) might still hallucinate when asked about general topics like 'What are the contributions of Albert Einstein?' (Out-of-Domain), showing no improvement over the base model because it hasn't learned the general principle of factuality.
Key Novelty
Atomic Preference Enhanced Factuality Tuning (APEFT)
  • Decomposes general responses into atomic sentences containing single facts to isolate specific errors
  • Constructs 'atomic preferences' by comparing a model's incorrect generation of a specific fact against a correct version, using a knowledge detection prompt to verify the model 'knows' the fact but failed to tell it
  • Combines these fine-grained atomic preferences with general paragraph-level preferences during training to teach the model to attend to individual factual claims
Evaluation Highlights
  • APEFT improves factuality by an average of +3.45% across both In-Domain (Bio) and Out-of-Domain (FAVA, FPQA, KUQA) datasets compared to standard preference learning
  • On the OOD dataset FAVA, APEFT achieves a 51.5% win rate, significantly outperforming standard DPO (44.6%) and other baselines
  • Token distribution analysis confirms APEFT increases the number of 'shifted tokens' on OOD data, effectively mitigating the under-alignment problem
Breakthrough Assessment
7/10
Solid contribution identifying 'under-alignment' as the cause of poor OOD factuality and proposing a logical, effective solution (atomic preferences). The gains are consistent, though the scope is limited to biography-based training.
×