_comment: REQUIRED: Define ALL technical terms, acronyms, and method names used ANYWHERE in the entire summary. After drafting the summary, perform a MANDATORY POST-DRAFT SCAN: check every section individually (Core.one_sentence_thesis, evaluation_highlights, core_problem, Technical_details, Experiments.key_results notes, Figures descriptions and key_insights). HIGH-VISIBILITY RULE: Terms appearing in one_sentence_thesis, evaluation_highlights, or figure key_insights MUST be defined—these are the first things readers see. COMMONLY MISSED: PPO, DPO, MARL, dense retrieval, silver labels, cosine schedule, clipped surrogate objective, Top-k, greedy decoding, beam search, logit, ViT, CLIP, Pareto improvement, BLEU, ROUGE, perplexity, attention heads, parameter sharing, warm start, convex combination, sawtooth profile, length-normalized attention ratio, NTP. If in doubt, define it.
Nonparametric router: A routing system that estimates performance based on similarity to training examples rather than a trained neural network
Parametric router: A routing system that uses a trained neural network (e.g., MLP) to predict model performance
Exponential tilt: A reweighting technique that multiplies a base distribution by an exponential function of a feature (here, proximity) to shift probability mass
AUC: Area Under the Curve—here referring to the area under the accuracy-cost tradeoff curve
Inlier: A query that comes from a task or distribution well-represented in the training set
Outlier: A query from a task or distribution not present or poorly represented in the training set
Lagrangian relaxation: A method to convert a constrained optimization problem (max accuracy subject to cost) into an unconstrained one with a penalty parameter lambda
MPNet: A sentence embedding model used to convert text queries into vector representations