← Back to Paper List

Can LLMs Refuse Questions They Do Not Know? Measuring Knowledge-Aware Refusal in Factual Tasks

Wenbo Pan, Jie Xu, Qiguang Chen, Junhao Dong, Libo Qin, Xinfeng Li, Haining Yu, Xiaohua Jia
Department of Computer Science, City University of Hong Kong, Hong Kong SAR, China, Harbin Institute of Technology, Harbin, China, College of Computing and Data Science, Nanyang Technological University, Singapore, School of Computer Science and Engineering, Central South University, Changsha, China
arXiv (2025)
Factuality Benchmark QA

📝 Paper Summary

Factuality Evaluation Hallucination Suppression Model Calibration
The Refusal Index measures an LLM's ability to refuse unknown questions by calculating the rank correlation between its refusal probability and its error probability using a two-pass evaluation.
Core Problem
Existing factuality metrics fail to accurately measure whether models refuse questions based on actual knowledge gaps; simple refusal rates are biased by model tendencies, while calibration metrics measure proxy processes rather than the model's intrinsic refusal behavior.
Why it matters:
  • LLMs frequently hallucinate answers with high confidence, necessitating reliable refusal mechanisms for safe deployment
  • Current metrics like F-score or Weighted Score are inconsistent, fluctuating wildly based on a model's arbitrary refusal threshold rather than its actual knowledge boundary
  • Standard calibration metrics (ECE) rely on verbalized confidence or auxiliary models, which often misalign with the model's actual generation behavior
Concrete Example: A model instructed to be conservative might achieve a high score simply by refusing everything (high refusal rate), even if it knows the answers. Conversely, a model might refuse random questions rather than difficult ones. Existing metrics struggle to distinguish a model that refuses *because* it doesn't know from a model that refuses due to a conservative prompt.
Key Novelty
Refusal Index (RI) via Two-Pass Evaluation
  • Defines knowledge-aware refusal as the Spearman rank correlation between a model's likelihood to refuse and its likelihood to be wrong, independent of the absolute refusal rate
  • Uses a lightweight two-pass process: Pass 1 allows refusal to observe behavior; Pass 2 forces an answer to check correctness. These binary signals are then fitted to a Gaussian copula to estimate the underlying correlation
Evaluation Highlights
  • RI demonstrates ~70% lower variability than heuristic metrics (F-score, Weighted Score) when tested on the same model across different refusal-inducing prompts
  • RI achieves 85% correlation with computationally expensive sampling-based calibration methods (AUROC on P(Answering)), validating it as a faithful calibration proxy
  • Model family is the strongest predictor of RI performance, with consistent rankings independent of model scale or instruction tuning
Breakthrough Assessment
8/10
Provides a mathematically grounded, robust metric for a critical safety capability (refusal). Solves the long-standing issue of metric instability caused by varying refusal rates.
×