_comment: REQUIRED: Define ALL technical terms, acronyms, and method names used ANYWHERE in the entire summary. After drafting the summary, perform a MANDATORY POST-DRAFT SCAN: check every section individually (Core.one_sentence_thesis, evaluation_highlights, core_problem, Technical_details, Experiments.key_results notes, Figures descriptions and key_insights). HIGH-VISIBILITY RULE: Terms appearing in one_sentence_thesis, evaluation_highlights, or figure key_insights MUST be defined—these are the first things readers see. COMMONLY MISSED: PPO, DPO, MARL, dense retrieval, silver labels, cosine schedule, clipped surrogate objective, Top-k, greedy decoding, beam search, logit, ViT, CLIP, Pareto improvement, BLEU, ROUGE, perplexity, attention heads, parameter sharing, warm start, convex combination, sawtooth profile, length-normalized attention ratio, NTP. If in doubt, define it.
Stochastic Depth: A regularization technique that randomly drops entire layers (residual blocks) during training to shorten the effective network depth and improve robustness
Linear Probing: A training phase where the pre-trained backbone is frozen and only the final linear classification head is updated
Macro AUROC: Area Under the Receiver Operating Characteristic curve, averaged across all classes (treating all classes equally regardless of frequency)
Macro AUPRC: Area Under the Precision-Recall Curve, averaged across all classes; more informative for imbalanced datasets
Transformer-FM: The specific Transformer-based foundation model used as the base in this paper (adapted from ST-MEM)
PTB-XL: A large, publicly available dataset of 12-lead ECGs with standardized clinical annotations used for benchmarking