Autoregressive (AR) Generation: Generating tokens one by one, where each new token depends strictly on the previous context
Non-Autoregressive (Non-AR) Generation: Generating tokens without strict left-to-right dependency, allowing global or parallel context access
Speculative Decoding: A 'Draft-and-Verify' technique where a small model guesses future tokens that are then verified in parallel by the large target model
Draft Model: A smaller, faster model used in speculative decoding to generate candidate tokens
Target Model: The main, large model that verifies drafts; determines the final output distribution
One-Shot Generation: A Non-AR method that generates all tokens in a single forward pass
Masked Generation: Iterative generation where subsets of tokens are masked and predicted in parallel (e.g., Mask-Predict, Diffusion)
Diffusion-based Generation: A Non-AR approach using iterative denoising to refine a sequence from random noise to text in parallel steps
Token Tree: A data structure used in verification to process multiple candidate sequences simultaneously
Self-drafting: Using the target model itself (often with skipped layers or early exits) to generate draft tokens instead of a separate model