Hallucination: The generation of content that contradicts user input, context, or established real-world facts
Contrastive Decoding: A decoding strategy that determines next-token probabilities by contrasting the logits of a strong model against a weak (or amateur) model
SFT: Supervised Fine-Tuning—training a pre-trained model on specific input-output pairs to adapt it for a downstream task
FActScore: An evaluation benchmark that breaks generated text (like biographies) into atomic facts and verifies them against a knowledge source (e.g., Wikipedia)
TruthfulQA: A benchmark designed to measure whether language models generate truthful answers to questions
Logits: The raw, unnormalized scores output by the last layer of a neural network before applying softmax
Adaptive Plausibility Constraint: A filtering mechanism in ICD that only applies the contrastive penalty to tokens that have a sufficiently high probability in the base model, preserving fluency
Behavior Cloning: A phenomenon where a model learns to mimic the surface form of the training data (e.g., answering every question) without learning the underlying logic or truthfulness