CoT: Chain-of-Thought—a prompting technique where models generate intermediate reasoning steps before the final answer.
cHMM: Conditional Hidden Markov Model—a statistical model where the system transitions between hidden states (thoughts) conditioned on an input, and emits observable outputs (text).
ELBO: Evidence Lower Bound—a loss function used in variational inference to approximate the intractable posterior distribution of latent variables.
NAR: Non-Autoregressive—generating multiple tokens or outputs in parallel rather than one by one.
Latent Space: A continuous vector space where the model represents information (thoughts) internally, as opposed to the discrete space of tokens.
Sparsity Loss: A penalty term (L1 norm) encouraging the model to use only a few active dimensions in the latent representation, making the randomness more interpretable.
Teacher-Forcing: A training technique where the model is fed the ground-truth previous step as input, rather than its own generated prediction.