Latent Thought Vectors: Continuous vector representations (z) that exist in a latent space and guide the generation of the visible token sequence
Dual-rate optimization: A training scheme alternating between optimizing local parameters (latent vectors specific to a sequence) and global parameters (model weights shared across all data)
Inference-time compute: Computational effort spent during the generation phase (specifically optimizing latent vectors) to improve output quality, distinct from training compute
ELBO: Evidence Lower Bound—a proxy objective function used in variational inference to approximate the intractable true likelihood
Posterior collapse: A failure mode in VAEs where the model ignores the latent variable z and generates based solely on the autoregressive decoder
trFLOPs/tok: Training floating-point operations per token—a metric for the total computational cost of training
Cross-attention: Attention mechanism where the model attends to the latent thought vectors (keys/values) using the text sequence as queries
Langevin dynamics: An iterative method for sampling from a probability distribution using gradients and noise injection
MAUVE: A metric for evaluating text generation quality by comparing the distribution of generated text to human text