_comment: REQUIRED: Define ALL technical terms, acronyms, and method names used ANYWHERE in the entire summary. After drafting the summary, perform a MANDATORY POST-DRAFT SCAN: check every section individually (Core.one_sentence_thesis, evaluation_highlights, core_problem, Technical_details, Experiments.key_results notes, Figures descriptions and key_insights). HIGH-VISIBILITY RULE: Terms appearing in one_sentence_thesis, evaluation_highlights, or figure key_insights MUST be defined—these are the first things readers see. COMMONLY MISSED: PPO, DPO, MARL, dense retrieval, silver labels, cosine schedule, clipped surrogate objective, Top-k, greedy decoding, beam search, logit, ViT, CLIP, Pareto improvement, BLEU, ROUGE, perplexity, attention heads, parameter sharing, warm start, convex combination, sawtooth profile, length-normalized attention ratio, NTP. If in doubt, define it.
CoT: Chain-of-Thought—a prompting technique where the model generates intermediate reasoning steps before the final answer
LoRA: Low-Rank Adaptation—a parameter-efficient fine-tuning technique that freezes pre-trained weights and injects trainable rank decomposition matrices
DPO: Direct Preference Optimization—a method for aligning language models to human preferences without a separate reward model
PPO: Proximal Policy Optimization—a reinforcement learning algorithm used for fine-tuning models based on reward signals
Instruction Tuning: Fine-tuning language models on datasets of (instruction, output) pairs to improve their ability to follow user commands
Cross-Lingual Alignment: Techniques ensuring that representations or model behaviors for equivalent concepts are similar across different languages
Model Editing: Methods to modify specific knowledge or behaviors in a pre-trained model without retraining the entire network
Typology: The classification of languages according to their structural features (e.g., word order, morphology)
Zero-shot: Evaluating a model on a task it has not explicitly seen examples of during training/prompting