← Back to Paper List

Flame: Factuality-aware alignment for LLMs

(Waterloo, CMU, Meta AI) Sheng-Chieh Lin, Luyu Gao, Barlas Oguz, Wenhan Xiong, Jimmy Lin, Wen-tau Yih, Xilun Chen
Carnegie Mellon University, Meta AI
arXiv, 5/2024 (2024)
Factuality RL

📝 Paper Summary

Knowledge internalization (post-training) Hallucination suppression
Flame improves LLM factuality by training on the model's own generated knowledge rather than unknown human/RAG data, avoiding the hallucination induced by forcing models to recite unfamiliar facts.
Core Problem
Standard alignment (SFT + RLHF) often degrades factuality because it forces models to learn from human or RAG-generated data containing information unknown to the pre-trained model.
Why it matters:
  • Fine-tuning on new/unknown knowledge inadvertently encourages hallucination by teaching the model to make up information it doesn't actually 'know'
  • Existing RL reward models prioritize helpfulness and length over factuality, often preferring detailed but fabricated responses
Concrete Example: A pilot study on biography generation shows that fine-tuning a standard LLM on high-quality biographies generated by a Retrieval-Augmented (RAG) teacher makes the student model hallucinate *more* than the baseline, because the RAG teacher's knowledge is external to the student.
Key Novelty
Factuality-Aware Alignment (Flame)
  • Identifying 'fact-based' instructions via a classifier to apply specialized training only where needed
  • Constructing SFT data using the model's own generated responses (distilled from few-shot prompting) rather than human gold data to prevent training on unknown knowledge
  • Employing a specialized factuality reward model during DPO that decomposes responses into atomic facts and verifies them using retrieval
Evaluation Highlights
  • +5.6 FActScore improvement on the Biography generation task compared to standard alignment (SFT+DPO)
  • Maintains strong instruction-following capability (51.2% win rate on Alpaca Eval) while significantly reducing hallucinations
  • Demonstrates that training on RAG-generated data (usually considered 'higher quality') actually hurts the factuality of non-RAG models
Breakthrough Assessment
7/10
Provides a counter-intuitive but crucial insight: better training data (RAG) can worsen model factuality if the model doesn't know the underlying facts. The proposed solution is effective and practical.
×