_comment: REQUIRED: Define ALL technical terms, acronyms, and method names used ANYWHERE in the entire summary. After drafting the summary, perform a MANDATORY POST-DRAFT SCAN: check every section individually (Core.one_sentence_thesis, evaluation_highlights, core_problem, Technical_details, Experiments.key_results notes, Figures descriptions and key_insights). HIGH-VISIBILITY RULE: Terms appearing in one_sentence_thesis, evaluation_highlights, or figure key_insights MUST be defined—these are the first things readers see. COMMONLY MISSED: PPO, DPO, MARL, dense retrieval, silver labels, cosine schedule, clipped surrogate objective, Top-k, greedy decoding, beam search, logit, ViT, CLIP, Pareto improvement, BLEU, ROUGE, perplexity, attention heads, parameter sharing, warm start, convex combination, sawtooth profile, length-normalized attention ratio, NTP. If in doubt, define it.
RAG: Retrieval-Augmented Generation—AI systems that answer questions by first searching for relevant documents
Fine-Tuning: The process of training a pre-trained model on a smaller, specific dataset to adapt it to a particular task or domain
LoRA: Low-Rank Adaptation—a parameter-efficient fine-tuning technique that freezes pre-trained weights and injects trainable rank decomposition matrices
FSDP: Fully Sharded Data Parallelism—a training method that shards model parameters across GPUs to reduce memory usage
GROBID: GeneRation Of BIbliographic Data—a machine learning library for extracting structured data (metadata, sections) from scientific PDF documents
TEI: Text Encoding Initiative—a standard format for representing texts in digital form, used here for structured PDF output
FAISS: Facebook AI Similarity Search—a library for efficient similarity search and clustering of dense vectors
BM25: Best Matching 25—a ranking function used in information retrieval to estimate the relevance of documents to a search query based on keyword matching
F1 score: A metric balancing precision and recall
ROUGE: Recall-Oriented Understudy for Gisting Evaluation—a set of metrics used to evaluate automatic summarization and translation
BLEU: Bilingual Evaluation Understudy—a metric for evaluating the quality of text which has been machine-translated from one natural language to another
Guidance framework: A programming paradigm that controls LLM generation by enforcing specific structures on inputs and outputs
Cosine learning rate scheduler: A method to adjust the learning rate during training following a cosine curve
Flash-attention: An algorithm that speeds up attention computation and reduces memory usage in Transformers
p.p.: Percentage points—the arithmetic difference between two percentages