_comment: REQUIRED: Define ALL technical terms, acronyms, and method names used ANYWHERE in the entire summary. After drafting the summary, perform a MANDATORY POST-DRAFT SCAN: check every section individually (Core.one_sentence_thesis, evaluation_highlights, core_problem, Technical_details, Experiments.key_results notes, Figures descriptions and key_insights). HIGH-VISIBILITY RULE: Terms appearing in one_sentence_thesis, evaluation_highlights, or figure key_insights MUST be defined—these are the first things readers see. COMMONLY MISSED: PPO, DPO, MARL, dense retrieval, silver labels, cosine schedule, clipped surrogate objective, Top-k, greedy decoding, beam search, logit, ViT, CLIP, Pareto improvement, BLEU, ROUGE, perplexity, attention heads, parameter sharing, warm start, convex combination, sawtooth profile, length-normalized attention ratio, NTP. If in doubt, define it.
Execution fidelity: A continuous score p_{i,t} predicting the probability that a robot can reliably make progress using global planning under current local conditions
Voronoi partition: A geometric method to divide a map into regions based on which robot is closest, used here to allocate exploration frontiers
BFS: Breadth-First Search—a graph traversal algorithm used here to compute shortest-path distances on the grid
A*: A* search algorithm—a pathfinding algorithm that finds the shortest path to a goal using heuristics
Self-supervised adaptation: Updating the model using labels generated from the robot's own experience (e.g., did I get stuck?) rather than human annotation
Hysteresis: A switching mechanism that requires a signal to persist for a certain duration or magnitude before changing state, preventing rapid oscillation
Reactive policy: A local control policy (often RL-based) that maps immediate sensor readings to actions without long-term planning
MAPF: Multi-Agent Path Finding—the problem of finding collision-free paths for multiple agents from start to goal locations
Pseudo-labels: Training targets derived automatically from heuristic rules or posterior outcomes (like 'did I crash?') rather than ground truth
D*: Dynamic A*—an incremental search algorithm efficient for replanning in changing environments
Gazebo: A widely used 3D robot simulator that simulates physics, sensors, and environments