hallucinations: Generated content that deviates from real-world facts observed during pretraining
DoLa: Decoding by Contrasting Layersโa strategy that subtracts lower-layer logits from final-layer logits to amplify factual signals
logits: Raw, unnormalized scores output by the model before the softmax function converts them into probabilities
Jensen-Shannon Divergence (JSD): A method to measure the similarity between two probability distributions; used here to detect when layer outputs change significantly
early exit: Obtaining a prediction from an intermediate layer of a neural network rather than processing all the way to the final layer
premature layer: An early or middle transformer layer selected for contrast because it contains linguistic patterns but lacks full factual knowledge
mature layer: The final transformer layer, assumed to contain the most complete semantic and factual information
contrastive decoding: A decoding method that finds tokens with high probability in an 'expert' model but low probability in an 'amateur' model; DoLa adapts this to layers within one model
adaptive plausibility constraint (APC): A filtering rule that sets the probability of tokens to zero if their likelihood in the expert/mature model is too low, preventing implausible outputs
TruthfulQA: A benchmark designed to measure whether language models generate false answers that mimic human misconceptions
Chain-of-Thought (CoT): A prompting strategy where the model generates intermediate reasoning steps before the final answer