Uncertainty Quantification (UQ): Methods to estimate how confident a model is in its own predictions, often used to predict correctness
CCP: Claim-Conditioned Probability—the proposed metric that measures the probability of a claim's meaning given the context, marginalizing over surface forms
NLI: Natural Language Inference—a task determining if one sentence entails, contradicts, or is neutral towards another
FactScore: An automatic evaluation metric that breaks text into atomic claims and verifies them against a knowledge base (like Wikipedia)
White-box model: A model where internal parameters and token probability distributions are accessible (unlike API-only black-box models)
Beam search: A decoding algorithm that explores multiple likely paths of token generation to find the most probable sequence
AUC-ROC: Area Under the Receiver Operating Characteristic Curve—a performance metric for classification tasks at various threshold settings
Surface form uncertainty: Uncertainty regarding which specific word (e.g., synonym) to use to express a concept, which does not affect factual correctness
Atomic claim: A simple, indivisible statement of fact extracted from a longer text (e.g., 'He was born in 1990')