← Back to Paper List

Real-time Factuality Assessment from Adversarial Feedback

Sanxing Chen, Yukun Huang, Bhuwan Dhingra
Duke University
arXiv (2024)
Factuality RAG Benchmark

📝 Paper Summary

Factuality Assessment Adversarial Evaluation
This paper introduces an adversarial pipeline that uses feedback from RAG-based detectors to iteratively generate deceptive real-time fake news, revealing that existing LLMs struggle to detect such misinformation without up-to-date retrieval.
Core Problem
Existing fake news datasets (e.g., PolitiFact) are often contaminated in LLM pre-training or contain shallow patterns that models learn as shortcuts, failing to test true reasoning about current events.
Why it matters:
  • LLMs achieve near-perfect performance on older claims due to data contamination, creating a false sense of security about their fact-checking abilities
  • Current evaluation methods do not adequately test an LLM's ability to reason about unfolding real-time events where parametric knowledge is insufficient
  • Standard neural fake news generation is easily detected by strong models, failing to provide a rigorous testbed for modern detectors
Concrete Example: When checking a claim about a 2024 event like an Iranian election, a standard LLM detector might rely on outdated 2022 patterns. The proposed generator iteratively rewrites the claim—first swapping the country to 'Saudi Arabia' (detected easily), then refining it to a plausible 'fuel price hike' cause—eventually tricking the detector.
Key Novelty
Adversarial Iterative News Rewriting with RAG Feedback
  • Uses a feedback loop where a 'Generator' LLM rewrites news based on rationales provided by a 'Detector' LLM, specifically targeting the detector's reasoning gaps
  • Incorporates real-time retrieval (RAG) into the adversary's feedback, allowing the generator to craft misinformation that is harder to debunk even with external evidence
  • Filters generated candidates using a separate contradiction detector to ensure they remain fake while maximizing plausibility
Architecture
Architecture Figure Figure 1
The iterative adversarial fake news generation pipeline
Evaluation Highlights
  • The iterative rewrite process reduces the AUC-ROC of a strong RAG-based GPT-4o detector by 17.5 absolute percentage points (from 82.4% to 64.9%)
  • Retrieval-free detectors (e.g., GPT-4o without RAG) perform near random guessing (48.8% AUC) on the generated dataset, proving vulnerability to unseen events
  • The generated dataset is significantly harder than previous benchmarks; GPT-4o achieves ~84% AUC on prior neural fake news but only ~49% on this new dataset
Breakthrough Assessment
8/10
Effective demonstration of how to break SOTA RAG detectors using adversarial feedback. Highlights critical weaknesses in current factuality evaluation benchmarks.
×