_comment: REQUIRED: Define ALL technical terms, acronyms, and method names used ANYWHERE in the entire summary. After drafting the summary, perform a MANDATORY POST-DRAFT SCAN: check every section individually (Core.one_sentence_thesis, evaluation_highlights, core_problem, Technical_details, Experiments.key_results notes, Figures descriptions and key_insights). HIGH-VISIBILITY RULE: Terms appearing in one_sentence_thesis, evaluation_highlights, or figure key_insights MUST be defined—these are the first things readers see. COMMONLY MISSED: PPO, DPO, MARL, dense retrieval, silver labels, cosine schedule, clipped surrogate objective, Top-k, greedy decoding, beam search, logit, ViT, CLIP, Pareto improvement, BLEU, ROUGE, perplexity, attention heads, parameter sharing, warm start, convex combination, sawtooth profile, length-normalized attention ratio, NTP. If in doubt, define it.
_example: {'RAG': 'Retrieval-Augmented Generation—AI systems that answer questions by first searching for relevant documents', 'F1 score': 'A metric balancing precision (are answers correct?) and recall (are answers complete?)', 'PPO': 'Proximal Policy Optimization—a reinforcement learning algorithm that updates a policy in small, stable steps using a clipped objective', 'parameter sharing': 'Multiple agents use the same underlying model weights, reducing memory and enabling coordination', 'warm start': 'Pre-training each module on labeled examples before switching to reinforcement learning, so agents start from a competent baseline'}
RAG: Retrieval-Augmented Generation—combines a retriever to find documents and a generator to produce answers based on them.
KILT: Knowledge Intensive Language Tasks—a benchmark suite encompassing QA, fact checking, and slot filling tasks grounded in Wikipedia.
FiD: Fusion-in-Decoder—an architecture that encodes retrieved documents independently and fuses them only in the decoder, allowing scaling to many documents.
DPR: Dense Passage Retrieval—uses dual encoders (BERT-based) to embed queries and passages into a shared vector space for retrieval.
MIPS: Maximum Inner Product Search—algorithm to find the most similar vectors in a large database efficiently.
REALM: Retrieval-Augmented Language Model Pre-training—pre-trains the retriever and generator jointly with a masked language modeling objective.
EM: Exact Match—metric measuring if the predicted answer string exactly matches the ground truth.
Natural Questions: A QA dataset consisting of real queries issued to the Google search engine.
TriviaQA: A reading comprehension dataset containing question-answer pairs authored by trivia enthusiasts.
FEVER: Fact Extraction and VERification—a benchmark dataset for fact-checking claims against Wikipedia.