_comment: REQUIRED: Define ALL technical terms, acronyms, and method names used ANYWHERE in the entire summary. After drafting the summary, perform a MANDATORY POST-DRAFT SCAN: check every section individually (Core.one_sentence_thesis, evaluation_highlights, core_problem, Technical_details, Experiments.key_results notes, Figures descriptions and key_insights). HIGH-VISIBILITY RULE: Terms appearing in one_sentence_thesis, evaluation_highlights, or figure key_insights MUST be defined—these are the first things readers see. COMMONLY MISSED: PPO, DPO, MARL, dense retrieval, silver labels, cosine schedule, clipped surrogate objective, Top-k, greedy decoding, beam search, logit, ViT, CLIP, Pareto improvement, BLEU, ROUGE, perplexity, attention heads, parameter sharing, warm start, convex combination, sawtooth profile, length-normalized attention ratio, NTP. If in doubt, define it.
D2T: Data-to-Text generation—converting structured data (tables, graphs) into natural language
Factual Inconsistency: The failure of generated text to entail the input facts, often resulting in hallucinations; measured as 1 - Consistency Score
Huber Loss: A loss function used in robust regression that is less sensitive to outliers in data than squared error loss
Vuong's test: A statistical likelihood-ratio test used to compare non-nested models (like power law vs. exponential) to see which fits the data better
QLoRA: Quantized Low-Rank Adapter—a parameter-efficient fine-tuning technique that reduces memory usage by quantizing the base model
AlignScore: A metric measuring factual consistency based on information alignment between source and generation
QAFactEval: A metric assessing consistency via question generation and answering (QG-QA) pipelines
SummaC-conv: A consistency metric leveraging natural language inference (NLI) models
UniEval-fact: A unified multi-dimensional evaluator where the 'fact' dimension measures factual consistency
Goodness-of-fit: A statistical test (here, F-test) determining how well a model's predicted values match the observed data
Pythia: A suite of decoder-only autoregressive language models designed for research on training dynamics
OPT: Open Pre-trained Transformer—a suite of open-source decoder-only models similar to GPT-3
BLOOM: BigScience Large Open-science Open-access Multilingual Language Model
Nucleus Sampling: A decoding strategy where the next token is sampled from the smallest set of top tokens whose cumulative probability exceeds a threshold p