ECE: Expected Calibration Error—a metric measuring the difference between predicted confidence and actual accuracy; lower is better
AUROC: Area Under the Receiver Operating Characteristic Curve—a metric measuring how well the confidence score distinguishes between correct and incorrect answers
Logarithmic Scoring Rule: A strictly proper scoring rule where the reward is log(p) if correct and log(1-p) if incorrect, incentivizing honest probability reporting
PPO: Proximal Policy Optimization—a reinforcement learning algorithm that updates a policy in small, stable steps using a clipped objective
LoRA: Low-Rank Adaptation—a parameter-efficient fine-tuning technique that freezes pre-trained weights and injects trainable rank decomposition matrices
RLHF: Reinforcement Learning from Human Feedback—training models using rewards derived from human preferences
Proper Scoring Rule: A scoring function where the expected reward is maximized if and only if the predicted probability matches the true probability
LACIE: Listener-Aware Confidence Estimation—a DPO-based baseline that optimizes confidence by simulating speaker-listener interactions
Trained Probe: A baseline method that trains a separate classifier on the model's internal hidden states to predict answer correctness