_comment: REQUIRED: Define ALL technical terms, acronyms, and method names used ANYWHERE in the entire summary. After drafting the summary, perform a MANDATORY POST-DRAFT SCAN: check every section individually (Core.one_sentence_thesis, evaluation_highlights, core_problem, Technical_details, Experiments.key_results notes, Figures descriptions and key_insights). HIGH-VISIBILITY RULE: Terms appearing in one_sentence_thesis, evaluation_highlights, or figure key_insights MUST be defined—these are the first things readers see. COMMONLY MISSED: PPO, DPO, MARL, dense retrieval, silver labels, cosine schedule, clipped surrogate objective, Top-k, greedy decoding, beam search, logit, ViT, CLIP, Pareto improvement, BLEU, ROUGE, perplexity, attention heads, parameter sharing, warm start, convex combination, sawtooth profile, length-normalized attention ratio, NTP. If in doubt, define it.
CTR: Click Through Rate—the ratio of users who click on a specific link to the number of total users who view a page, email, or advertisement.
Item Pool (IP): The set of candidate items (questions) currently being recommended to users.
LLM Optimizer: Using an LLM to improve a solution by providing it with the problem description and previous attempts' performance in the prompt, rather than updating weights.
Generative Explore-Exploit: A strategy where the model generates content likely to succeed based on history (exploit) while also generating diverse content to find new interests (explore).
User Persona: A structured textual description of a user type (e.g., 'Price-conscious shopper') used to simulate user behavior and preferences.
Rejection Score (RS): A threshold logit value in the user simulator; if no item's relevance score exceeds this effectively, the user chooses 'no click'.