MIL: Multiple Instance Learning—a form of weakly supervised learning where labels are assigned to bags of items (sequences) rather than individual items (tokens), and the goal is to predict bag labels by identifying key instances
AUROC: Area Under the Receiver Operating Characteristic curve—a performance metric for classification problems at various threshold settings
hallucination: Unfaithful or incorrect generations produced by an LLM that deviate from facts or the input context
hidden states: The internal vector representations of tokens within the layers of a neural network (LLM) before the final output layer
predictive uncertainty: A measure of how unsure a model is about its prediction, often calculated using probabilities (logits) or entropy
semantic entropy: A metric that measures uncertainty by clustering generated answers based on meaning and calculating entropy over these clusters
perplexity: A measurement of how well a probability model predicts a sample; in LLMs, it reflects the 'surprise' of the model when generating text
hard negative: Tokens within a trustworthy (negative) response that look most similar to hallucinated tokens, used to train the model to be more discriminative