_comment: REQUIRED: Define ALL technical terms, acronyms, and method names used ANYWHERE in the entire summary. After drafting the summary, perform a MANDATORY POST-DRAFT SCAN: check every section individually (Core.one_sentence_thesis, evaluation_highlights, core_problem, Technical_details, Experiments.key_results notes, Figures descriptions and key_insights). HIGH-VISIBILITY RULE: Terms appearing in one_sentence_thesis, evaluation_highlights, or figure key_insights MUST be defined—these are the first things readers see. COMMONLY MISSED: PPO, DPO, MARL, dense retrieval, silver labels, cosine schedule, clipped surrogate objective, Top-k, greedy decoding, beam search, logit, ViT, CLIP, Pareto improvement, BLEU, ROUGE, perplexity, attention heads, parameter sharing, warm start, convex combination, sawtooth profile, length-normalized attention ratio, NTP. If in doubt, define it.
KG: Knowledge Graph—a structured representation of knowledge using entities (nodes) and relations (edges)
CoT: Chain-of-Thought—a prompting technique where the model generates intermediate reasoning steps before the final answer
Reasoning Path: A sequence of KG triples connecting a start entity (from the question) to an end entity (the answer)
Discriminative Evaluation: A multiple-choice setting where the model must identify the correct reasoning path among invalid distractors
Generative Evaluation: A setting where the model generates the reasoning text freely, which is then parsed and checked against the KG
Needleman-Wunsch algorithm: A sequence alignment algorithm used here to measure the edit distance between the generated reasoning path and the ground truth path
Sentence-BERT: A modification of the BERT network that uses siamese networks to derive semantically meaningful sentence embeddings that can be compared using cosine-similarity
Triple Verbalization: Converting a KG triple (Subject, Relation, Object) into a natural language sentence for embedding
AUC: Area Under the Curve—a performance metric for classification tasks, where 1.0 is perfect and 0.5 is random guessing