_comment: REQUIRED: Define ALL technical terms, acronyms, and method names used ANYWHERE in the entire summary. After drafting the summary, perform a MANDATORY POST-DRAFT SCAN: check every section individually (Core.one_sentence_thesis, evaluation_highlights, core_problem, Technical_details, Experiments.key_results notes, Figures descriptions and key_insights). HIGH-VISIBILITY RULE: Terms appearing in one_sentence_thesis, evaluation_highlights, or figure key_insights MUST be defined—these are the first things readers see. COMMONLY MISSED: PPO, DPO, MARL, dense retrieval, silver labels, cosine schedule, clipped surrogate objective, Top-k, greedy decoding, beam search, logit, ViT, CLIP, Pareto improvement, BLEU, ROUGE, perplexity, attention heads, parameter sharing, warm start, convex combination, sawtooth profile, length-normalized attention ratio, NTP. If in doubt, define it.
LATS: Language Agent Tree Search—an agentic workflow using Monte Carlo Tree Search to explore multiple reasoning paths
CoT: Chain-of-Thought—a prompting technique encouraging the model to generate intermediate reasoning steps
ReAct: Reason+Act—an agentic pattern where the model alternates between generating reasoning traces and executing tool actions
Reflexion: An agent framework that includes a self-reflection step to evaluate and refine past actions
KV cache: Key-Value cache—stored attention representations of past tokens used to speed up LLM generation
Prefix caching: A serving optimization that reuses KV cache for shared prompt prefixes across requests
Test-time scaling: Improving model performance by increasing computation during inference (e.g., via more search steps) rather than training
vLLM: A high-throughput and memory-efficient LLM serving engine
LLMCompiler: An agent framework that optimizes latency by generating parallel tool calls and streaming them for asynchronous execution