BatchEnsemble: A parameter-efficient ensemble method where members share a weight matrix and differ only by rank-one 'fast weights' multiplied elementwise
LoRA: Low-Rank Adaptation—a technique to fine-tune large models by updating only small low-rank matrices added to the frozen weights
Fast weights: In BatchEnsemble, small trainable vectors (rank-1 matrices) unique to each ensemble member that modulate the shared weights
Faithfulness hallucination: When an LLM deviates from the provided instructions or context (e.g., answering a question that the context says is unanswerable)
Factual hallucination: When an LLM generates content that contradicts verifiable real-world facts
Predictive entropy: A measure of uncertainty calculated from the distribution of predicted tokens; high entropy implies high uncertainty
Aleatoric uncertainty: Uncertainty arising from inherent noise or variability in the data (irreducible)
Epistemic uncertainty: Uncertainty arising from the model's lack of knowledge (reducible with more data)
SQuAD: Stanford Question Answering Dataset—a reading comprehension benchmark
MMLU: Massive Multitask Language Understanding—a benchmark evaluating models on factual knowledge across diverse subjects