← Back to Paper List

Good Parenting is all you need -- Multi-agentic LLM Hallucination Mitigation

Ted Kwartler, Matthew Berman, Alan Aqrawi
Harvard University, Forward Future
arXiv (2024)
Agent Factuality

📝 Paper Summary

Multi-agent collaboration Hallucination mitigation
Advanced LLMs acting as reviewing agents can effectively detect and correct hallucinations in content generated by other models (or themselves) within a multi-agent workflow, achieving near-perfect detection rates.
Core Problem
LLMs frequently hallucinate factual information, and these errors can persist or worsen in complex workflows if unchecked.
Why it matters:
  • Hallucinations undermine trust in AI-generated content, especially in high-stakes domains requiring accuracy
  • Smaller, less sophisticated models often lack the intrinsic capability to self-correct effectively without external feedback
  • Existing research often focuses on isolated detection rather than orchestrating correction within autonomous multi-agent systems
Concrete Example: When asked to write about a fictional artist 'Flipfloppidy', a primary agent invents a detailed biography (albums, influences). Without a reviewer, this fabrication is presented as fact. In the study, a reviewer agent flags the artist as non-existent, prompting the primary agent to either admit fiction or correct the topic.
Key Novelty
Multi-agentic 'Parenting' Workflow
  • Establish a dual-agent system where a 'Reviewing Agent' acts as a critic to a 'Primary Agent' (content creator), specifically tasked with fact-checking against a known hallucination trigger (a fictional entity)
  • Evaluate the 'parenting' dynamic across varying model sizes, testing if smaller models can correct larger ones and vice versa
Evaluation Highlights
  • Advanced models (Llama3-70b, GPT-4 variants) achieved 98-100% accuracy in identifying hallucinations about the fictional subject
  • Successful revision rates reached 85-100% for top-tier models following feedback
  • Smaller models (Gemma-7b, Mistral) failed significantly, identifying hallucinations in as few as 0% of cases and rarely accepting critique
Breakthrough Assessment
4/10
Provides empirical evidence for the efficacy of multi-agent critique patterns, but the scope is limited to a single specific hallucination trigger (a fictional artist), limiting generalizability claims.
×