Hallucination: Undesirable LLM outputs that are incorrect, unfaithful to input, or internally inconsistent
Calibration: Adjusting model confidence scores so that the predicted probability matches the actual frequency of correctness (e.g., things predicted with 0.8 confidence are correct 80% of the time)
Multicalibration: A calibration technique that ensures calibration holds not just on average, but across identified subpopulations or groups within the data
Inverse Perplexity: A metric derived from the model's logits representing the inverse of the exponentiated average negative log-likelihood; a measure of model confidence
SelfCheckGPT: A hallucination detection method that checks consistency between a generated response and multiple stochastically sampled alternative responses
NLI: Natural Language Inference—determining if a hypothesis is true (entailment), false (contradiction), or neutral given a premise
DeBERTa: Decoding-enhanced BERT with disentangled attention—a transformer model often used for NLI tasks
Logit: The raw, unnormalized output vector from the last layer of a neural network before applying softmax
AUC-ROC: Area Under the Receiver Operating Characteristic curve—a performance metric for classification problems at various threshold settings
ECE: Expected Calibration Error—a weighted average of the difference between predicted confidence and actual accuracy across bins