MMLU: Massive Multitask Language Understanding—a benchmark designed to measure knowledge acquired during pre-training by evaluating models exclusively in zero-shot and few-shot settings
OPT: Open Pre-trained Transformer—a suite of decoder-only pre-trained transformers ranging from 125M to 175B parameters
Pythia: A suite of decoder-only models designed to facilitate scientific research on training dynamics and scaling
Cross-entropy loss: A loss function that measures the performance of a classification model whose output is a probability value between 0 and 1; lower is better
Autoregressive: A property of models that predict the next element in a sequence based on previous elements
Attention intervention: A technique where attention weights or patterns are manipulated or swapped between models to test the robustness and localization of learned capabilities
Confidence-Competence Gap: A ratio proposed by the authors measuring the divergence between improvements in loss (confidence) and improvements in accuracy (competence)
Decoder-only: A transformer architecture that uses masked self-attention to process sequences, typical of GPT-style models trained on next-token prediction