ICL: In-Context Learning—the ability of a model to adapt to a new task using only a few examples in the prompt without updating weights
LLR: Log-Likelihood Ratio—the logarithm of the ratio of probabilities of a data point under two competing hypotheses; the optimal decision statistic
Neyman-Pearson Lemma: A statistical theorem stating that the likelihood-ratio test constitutes the most powerful test for binary hypothesis testing at a given significance level
Sufficient Statistic: A summary of the data that contains all the information needed to estimate a parameter or make a decision (e.g., sample mean for a Gaussian)
Logit Lens: An interpretability technique that decodes the hidden states of intermediate layers into the vocabulary space to see what the model 'believes' at each step
OV Circuit: Output-Value Circuit—a component of an attention head formed by the product of the Value and Output weight matrices ($W_{OV} = W_V W_O$), determining how information is written to the residual stream
BCE: Binary Cross-Entropy—a loss function used for binary classification tasks
Grokking: A phenomenon where a model transitions from memorization to generalization (sudden improvement in validation accuracy) after extended training
OOD: Out-of-Distribution—data that differs significantly from the training distribution (e.g., larger nuisance shifts)