Type I error: The error of incorrectly classifying a hallucination/uncertain answer as truthful/certain (False Positive in this context).
Type II error: The error of incorrectly classifying a truthful/certain answer as uncertain (False Negative), leading to unnecessary abstention.
Neyman-Pearson (NP) classification: A binary classification framework that minimizes Type II error while keeping Type I error bounded by a specific level alpha.
Conformal prediction: A statistical technique used to determine precise levels of confidence in new predictions based on past performance.
Covariate shift: A situation where the distribution of input data (questions) changes between the calibration phase and the testing phase.
Density ratio estimation: A method to correct for distribution shifts by weighting samples based on the ratio of test density to calibration density.
Greedy decoding: A deterministic text generation method that always selects the highest probability token at each step.
R-Tuning: Refusal-Aware Instruction Tuningโa baseline method that fine-tunes models to refuse questions they cannot answer correctly.
Sample complexity: The number of data samples required to achieve a desired level of statistical performance.