contextual entropy: A metric quantifying how 'spread out' a candidate token's activation is across the input prompt tokens; lower entropy indicates sharper, more specific focus.
in-context sharpness: The phenomenon where correct tokens exhibit strong, distinct activations with specific tokens in the context prompt at intermediate layers.
DoLa: Decoding by Contrasting Layers—a baseline method that contrasts logits from different layers to amplify factual signals.
ITI: Inference-Time Intervention—a baseline that shifts model activations along 'truthful' directions discovered via probing.
activation score: The projection of a hidden state onto the vocabulary embedding of a specific token, measuring how likely that hidden state encodes the token.
informative layer: A specific intermediate transformer layer (e.g., layer 26 in LLaMA-2-7B) selected for calculating activation patterns, believed to contain factual associations.
Truth*Info: A composite metric for TruthfulQA that multiplies the truthfulness score by the informativeness score to penalize non-answers like 'I have no comment'.