MLE: Maximum Likelihood Estimation—a method to estimate parameters of a probability distribution by maximizing the likelihood of observing the data
ERM: Empirical Risk Minimization—a principle in statistical learning theory that defines a family of learning algorithms by minimizing the average loss on the training data
Latent Variable Model: A statistical model that relates a set of observable variables to a set of unobservable variables (latents)
Factor Model: A model where high-dimensional observed variables are modeled as linear combinations of potential lower-dimensional latent factors plus noise
GMM: Gaussian Mixture Model—a probabilistic model that assumes all the data points are generated from a mixture of a finite number of Gaussian distributions
Rademacher Complexity: A measure of the richness of a class of real-valued functions, used to derive generalization bounds
Covering Number: A measure of the size of a function class, defined as the number of balls of a certain radius needed to cover the class
Excess Risk: The difference between the risk (error) of the learned function and the risk of the best possible function in the class