_comment: REQUIRED: Define ALL technical terms, acronyms, and method names used ANYWHERE in the entire summary. After drafting the summary, perform a MANDATORY POST-DRAFT SCAN: check every section individually (Core.one_sentence_thesis, evaluation_highlights, core_problem, Technical_details, Experiments.key_results notes, Figures descriptions and key_insights). HIGH-VISIBILITY RULE: Terms appearing in one_sentence_thesis, evaluation_highlights, or figure key_insights MUST be defined—these are the first things readers see. COMMONLY MISSED: PPO, DPO, MARL, dense retrieval, silver labels, cosine schedule, clipped surrogate objective, Top-k, greedy decoding, beam search, logit, ViT, CLIP, Pareto improvement, BLEU, ROUGE, perplexity, attention heads, parameter sharing, warm start, convex combination, sawtooth profile, length-normalized attention ratio, NTP. If in doubt, define it.
MOTA: Multiple Object Tracking Accuracy—a primary metric combining false positives, false negatives, and identity switches.
IDF1: ID F1 Score—a metric measuring how consistently the tracker maintains object identities over time.
Mamba: A recent architecture based on State Space Models (SSM) that achieves global receptive fields with linear computational complexity, unlike Transformers.
SSM: State Space Model—a mathematical framework used here for efficient sequence modeling of visual features.
YOLOX: An anchor-free version of the YOLO (You Only Look Once) object detection family.
EMD-Flow: An optical flow estimation network used to generate ground-truth motion labels for training.
IOU: Intersection Over Union—a metric measuring the overlap between two bounding boxes.
V-SSM: Vertical State Space Model—a branch of the Mamba block scanning features vertically.
H-SSM: Horizontal State Space Model—a branch of the Mamba block scanning features horizontally.