_comment: REQUIRED: Define ALL technical terms, acronyms, and method names used ANYWHERE in the entire summary. After drafting the summary, perform a MANDATORY POST-DRAFT SCAN: check every section individually (Core.one_sentence_thesis, evaluation_highlights, core_problem, Technical_details, Experiments.key_results notes, Figures descriptions and key_insights). HIGH-VISIBILITY RULE: Terms appearing in one_sentence_thesis, evaluation_highlights, or figure key_insights MUST be defined—these are the first things readers see. COMMONLY MISSED: PPO, DPO, MARL, dense retrieval, silver labels, cosine schedule, clipped surrogate objective, Top-k, greedy decoding, beam search, logit, ViT, CLIP, Pareto improvement, BLEU, ROUGE, perplexity, attention heads, parameter sharing, warm start, convex combination, sawtooth profile, length-normalized attention ratio, NTP. If in doubt, define it.
L2D: Light Latent-space Decoding—the proposed method that decodes items by matching hidden states rather than generating text
Autoregressive decoding: Generating text one token at a time, where each new token depends on all previously generated tokens
Language-space decoding: The standard LLM process of generating output as natural language text (e.g., item titles)
Latent space: The internal high-dimensional vector space of the LLM where input text is represented as numerical embeddings
Hidden state: The vector representation of the input at the final layer of the LLM, before the classification head
Reservoir sampling: A randomized algorithm to choose a simple random sample of k items from a list of n items, where n is either a very large or unknown number
NDCG: Normalized Discounted Cumulative Gain—a measure of ranking quality that accounts for the position of relevant items
Beam search: A search algorithm that explores multiple promising paths (beams) simultaneously to find the most likely sequence of tokens
SFT: Supervised Fine-Tuning—training the LLM on labeled (prompt, ground-truth item) pairs