VAE: Variational Autoencoder—a generative model that learns probabilistic latent representations of data
GMM: Gaussian Mixture Model—a probabilistic model that assumes all data points are generated from a mixture of a finite number of Gaussian distributions
MoE: Mixture of Experts—a neural network architecture where different parts of the network (experts) specialize in different subsets of the data
ELBO: Evidence Lower Bound—the objective function maximized during VAE training to approximate the true data likelihood
Reparameterization Trick: A technique allowing gradients to backpropagate through stochastic nodes in a neural network by separating randomness from parameters
Collaborative Filtering: A recommendation technique that predicts user preferences by assuming that users who agreed in the past will agree in the future
Top-k: Selection strategy choosing the k highest-scoring options (e.g., experts)
Gate: A neural network component that decides which experts should process a given input
PPL: Perplexity—a metric measuring how well a probability model predicts a sample (lower is better)
BLEU: Bilingual Evaluation Understudy—a metric for evaluating the quality of text which has been machine-translated from one natural language to another
ROUGE: Recall-Oriented Understudy for Gisting Evaluation—a set of metrics used to evaluate automatic summarization and machine translation software in NLP