โ† Back to Paper List

One SPACE to Rule Them All: Jointly Mitigating Factuality and Faithfulness Hallucinations in LLMs

Pengbo Wang, Chaozhuo Li, Chenxu Wang, Liwen Zheng, Litian Zhang, Xi Zhang
Beijing University of Posts and Telecommunications, Shihezi University
arXiv (2025)
Factuality Benchmark

๐Ÿ“ Paper Summary

Hallucination suppression Activation steering / Model editing
SPACE identifies and edits a shared subspace of neural activations where both factuality and faithfulness intersect, allowing simultaneous improvement of both metrics without the trade-offs inherent in single-task optimization.
Core Problem
Existing methods mitigate factuality and faithfulness hallucinations independently, but interventions targeting one type often degrade performance on the other due to distorted activation subspaces.
Why it matters:
  • LLM reliability is compromised when fixing one error type introduces another (e.g., increasing factual accuracy but causing the model to ignore user instructions)
  • Current approaches force a trade-off: TruthX improves factuality but hurts faithfulness on PDTB, while CAD improves faithfulness but degrades factuality
  • Theoretical analysis reveals that divergent optimization directions during training physically separate the activation patterns for these two tasks
Concrete Example: When a model is optimized for factuality (e.g., TruthX), it correctly states 'Canberra is the capital of Australia' but might answer 'The cheetah runs fastest' to the trick question 'Who runs faster, the turtle or the rabbit?', ignoring the context of the fable (a faithfulness failure). Conversely, optimizing for faithfulness might respect the prompt but hallucinate facts.
Key Novelty
SPACE (Spatial Processing for Activated Combined Embeddings)
  • Identifies a 'shared subspace' in the model's activations where neurons contribute to both factuality and faithfulness, rather than treating them as disjoint tasks
  • Uses a hybrid probe strategy combining contrastive learning and spectral clustering to pinpoint these intersectional features
  • Applys targeted editing vectors to specific attention heads during inference to steer the model into this shared optimal state
Evaluation Highlights
  • Outperforms TruthX and CAD baselines by simultaneously improving factuality on TruthfulQA and faithfulness on PDTB (specific numeric deltas not explicitly summarized in text but claimed as superior)
  • Demonstrates the existence of a theoretical trade-off between factuality and faithfulness in standard models like Llama-2-7b, which SPACE successfully mitigates
  • Validates the 'killing two birds with one stone' effect, where a single intervention enhances performance across distinct hallucination categories
Breakthrough Assessment
7/10
Novel theoretical framing of the factuality-faithfulness trade-off and a geometric solution via shared subspace editing. While effective, it builds on existing steering concepts.
×