monofact: A fact that appears exactly once in the training data; its prevalence is a key driver of hallucination according to the Kalai-Vempala bound
miscalibration: The difference between a model's predicted confidence scores and the actual empirical frequency of correctness; usually minimized in ML, but increased here to reduce hallucination
selective upweighting: A training intervention where a small subset of data is repeated multiple times to force the model to become overconfident (miscalibrated) on those examples
Pareto distribution: A heavy-tailed probability distribution used here to generate training data, allowing precise control over the frequency of rare facts
Good-Turing estimator: A statistical method for estimating the probability of encountering missing or unseen elements (like unseen species or words) based on frequency counts
KL divergence: Kullback-Leibler divergence—a measure of how one probability distribution differs from another; used here as an empirical proxy for miscalibration
n-gram model: A simple probabilistic language model that predicts the next item in a sequence based on the (n-1) previous items