HK+: Hallucination despite Knowledge—the model outputs an incorrect answer even though it contains the correct knowledge in its parameters
HK-: Hallucination due to Knowledge deficiency—the model outputs an incorrect answer because it does not possess the required knowledge
WACK: Wrong Answers despite Correct Knowledge—the proposed automatic framework for generating model-specific datasets containing HK+ and HK- examples
CBQA: Closed-Book Question Answering—answering questions without access to external documents
Snowballing: A phenomenon where a model's prior mistakes (or mistakes in the prompt context) lead to further incorrect generations
Greedy decoding: A decoding strategy where the model always selects the token with the highest probability
Linear probe: A simple linear classifier trained on the internal activations (hidden states) of a neural network to predict a specific property
AUC: Area Under the ROC Curve—a performance metric for classification tasks, where 1.0 is perfect and 0.5 is random guessing