Speculative Decoding: An acceleration technique where a small draft model generates candidate tokens that are then verified in parallel by a larger target model
N-to-K verification: A verification setting where a draft model proposes N sequences, and the step is accepted only if all K required sequences (for beam search) are successfully found among them
RKLD: Reverse Kullback-Leibler Divergence—a metric used here to align the draft model's distribution to the target model's distribution
TVD: Total Variation Distance—a measure of the difference between two probability distributions, used to minimize the gap between draft and target probabilities
AtSpeed-S: The proposed alignment objective for Strict Top-K verification, optimizing the draft model to match the target's top-K set exactly
AtSpeed-R: The proposed alignment objective for Relaxed sampling verification, optimizing the draft model to match the target's distribution flexibly
Beam Search: A search algorithm that explores a graph by expanding the most promising node in a limited set
Codebook-based item identifier: Representing items as sequences of discrete codes/tokens rather than raw text, ensuring fixed lengths for simpler processing