Kolmogorov Structure Function: A function characterizing the compressibility of data given a constraint on the model's complexity (size).
Two-part Code: A compression scheme describing data by first describing the model, then describing the data using that model.
Redundancy: The difference between the expected code length of a model and the fundamental entropy of the data source.
Pitman-Yor Process: A stochastic process used in Bayesian nonparametrics to generate power-law distributed data (like word frequencies in natural language).
Zipf's Law: An empirical law stating that the frequency of a token is inversely proportional to its rank.
Heap's Law: An empirical law describing how the number of distinct vocabulary items grows with the size of the document collection.
Minimal Sufficient Statistics: The simplest model that captures all regularities in the data, leaving only random noise as residual.
Scaling Laws: Empirical relationships predicting model performance (loss) as a power-law function of compute, data size, or parameter count.