diagonal argument: A mathematical proof technique used to show that certain sets (like real numbers) are larger than others (like integers), often used to prove undecidability
computability theory: The branch of logic and computer science that deals with what problems can be solved by an algorithm (Turing machine)
LMT: Language Model Trainer—a map taking a training dataset and returning a computable language model
hallucination probability: The probability that a language model generates an output not in the acceptable ground-truth set for a random input
acceptable output set map: A ground-truth map F0 assigning a set of valid/factual output strings to every input string
qualified random training data: A dataset where inputs are i.i.d. and outputs are guaranteed to be within the ground-truth acceptable set
statistical negligibility: The property that the probability of error (hallucination) can be made arbitrarily close to zero with high confidence given enough data
uniform statistical negligibility: Negligibility where the required training data size depends only on the error tolerance, not on the specific data distribution
CDF: Cumulative Distribution Function—describes the probability that a random variable (here, input length) is less than or equal to a value