MSP: Maximum Softmax Probability—a baseline uncertainty measure using the highest probability token at each step.
EigenScore: The proposed metric calculated as the logarithm determinant (sum of log eigenvalues) of the covariance matrix of sentence embeddings.
LogDet: Logarithm of the determinant of a matrix.
Differential Entropy: The entropy of a continuous random variable; the paper proves EigenScore is equivalent to this for Gaussian distributions.
AUROC: Area Under the Receiver Operating Characteristic curve—a standard metric for binary classification performance.
internal states: The hidden layer representations (embeddings) within the LLM, specifically the penultimate layer in this work.
feature clipping: Truncating the values of hidden neuron activations that fall into extreme percentiles (e.g., top/bottom 0.2%) to reduce model overconfidence.
Lexical Similarity: A baseline method measuring consistency by comparing word overlap (e.g., ROUGE scores) between generated responses.
SelfCheckGPT: A strong baseline method for hallucination detection that checks consistency among sampled responses.