← Back to Paper List

Reducing conversational agents’ overconfidence through linguistic calibration

(FAIR) Sabrina J. Mielke, Anthur Szlam, Emily Dinan, Y-Lan Boureau
Johns Hopkins University, Facebook AI Research
arXiv, 12/2020 (2022)
Factuality QA Benchmark

📝 Paper Summary

Hallucination suppression Calibration of confidence
The paper improves chatbot reliability by training a calibrator to predict answer correctness and using controllable generation to align the model's verbalized confidence (e.g., 'I think...') with that prediction.
Core Problem
State-of-the-art open-domain dialogue agents are 'linguistically uncalibrated': they often express high confidence in factually incorrect answers, misleading users.
Why it matters:
  • Conversational agents (like BlenderBot) often hallucinate facts while sounding authoritative, which risks misleading users who genuinely don't know the answer
  • Prior work focuses on factual accuracy or probabilistic calibration (logits), but less on 'linguistic calibration'—whether the generated text itself communicates appropriate doubt
  • Even if accuracy isn't perfect, 'owning ignorance' via metacognitive features makes agents more transparent and trustworthy
Concrete Example: When asked 'Which is heavier, 1 kg feathers or 1 kg stone?', a SOTA model confidently answers 'Feathers, because they are heavier than a kilogram of any other material.' The proposed system instead responds: 'I'm not sure, but my guess is...'
Key Novelty
Calibrator-Controlled Chatbot Pipeline
  • Train a 'Calibrator' model that predicts the probability a generated answer is correct based on the model's internal states
  • Fine-tune the dialogue agent to accept control tokens (<HI>, <LO>, <DK>) that dictate the level of certainty expressed in the text
  • At inference time, use the Calibrator's prediction to select the appropriate control token, forcing the model to verbalize doubt when it is likely wrong
Architecture
Architecture Figure Figure 1
The proposed 'calibrator-controlled chatbot' pipeline.
Evaluation Highlights
  • Correctness of linguistically confident answers (<HI>) increased from 13.7% (vanilla) to 38.9% (calibrator-controlled) on TriviaQA
  • The controlled model maintains 88.46% of originally correct answers when forced to generate confident text, compared to 56.81% for naive style control
  • Off-topic (OT) responses reduced significantly from 2.4% to 0.2% in the calibrated system
Breakthrough Assessment
7/10
Simple but highly effective approach to a critical problem (overconfidence). While it doesn't solve hallucination, it makes systems significantly safer by aligning language with likely accuracy.
×