_comment: REQUIRED: Define ALL technical terms, acronyms, and method names used ANYWHERE in the entire summary. After drafting the summary, perform a MANDATORY POST-DRAFT SCAN: check every section individually (Core.one_sentence_thesis, evaluation_highlights, core_problem, Technical_details, Experiments.key_results notes, Figures descriptions and key_insights). HIGH-VISIBILITY RULE: Terms appearing in one_sentence_thesis, evaluation_highlights, or figure key_insights MUST be defined—these are the first things readers see. COMMONLY MISSED: PPO, DPO, MARL, dense retrieval, silver labels, cosine schedule, clipped surrogate objective, Top-k, greedy decoding, beam search, logit, ViT, CLIP, Pareto improvement, BLEU, ROUGE, perplexity, attention heads, parameter sharing, warm start, convex combination, sawtooth profile, length-normalized attention ratio, NTP. If in doubt, define it.
MLLM: Multimodal Large Language Model—an AI model capable of processing and generating content across multiple modalities like text and video
SFT: Supervised Fine-Tuning—training a pre-trained model on a labeled dataset to adapt it to a specific task
CoT: Chain-of-Thought—a prompting or training technique where the model generates intermediate reasoning steps before the final answer
GRPO: Group Relative Policy Optimization—a reinforcement learning algorithm that optimizes a policy by comparing a group of outputs for the same input, removing the need for a critic network
Ego-centric: First-person perspective, typically from a camera worn on the head or body (e.g., smart glasses)
Reverse Thinking: A cognitive process modeled here where the system reasons about a sequence of events (like a route) in reverse order to verify or derive the correct forward sequence
KL divergence: Kullback-Leibler divergence—a statistical distance measure used here as a penalty to prevent the RL-tuned model from drifting too far from its initial SFT state
PPO: Proximal Policy Optimization—a standard RL algorithm; GRPO is a variation of this without a value function critic