_comment: REQUIRED: Define ALL technical terms, acronyms, and method names used ANYWHERE in the entire summary. After drafting the summary, perform a MANDATORY POST-DRAFT SCAN: check every section individually (Core.one_sentence_thesis, evaluation_highlights, core_problem, Technical_details, Experiments.key_results notes, Figures descriptions and key_insights). HIGH-VISIBILITY RULE: Terms appearing in one_sentence_thesis, evaluation_highlights, or figure key_insights MUST be defined—these are the first things readers see. COMMONLY MISSED: PPO, DPO, MARL, dense retrieval, silver labels, cosine schedule, clipped surrogate objective, Top-k, greedy decoding, beam search, logit, ViT, CLIP, Pareto improvement, BLEU, ROUGE, perplexity, attention heads, parameter sharing, warm start, convex combination, sawtooth profile, length-normalized attention ratio, NTP. If in doubt, define it.
CDE: Clinical Data Elements—fundamental units of healthcare information (e.g., patient demographics, diagnoses, lab tests)
Composite CDE: A data element containing interdependent or hierarchical attributes (e.g., a family history entry documenting both a diagnosis and the affected relative)
Atomic CDE: A data element representing a single characteristic (e.g., 'sex' or 'blood group')
OMOP: Observational Medical Outcomes Partnership—a common data model standardizing structure and semantics of observational health data
RAG: Retrieval-Augmented Generation—combining generative models with external knowledge retrieval to improve accuracy
SapBERT: A BERT-based model pretrained on biomedical entities to improve alignment of synonymous medical terms
SPLADE: Sparse Lexical and Expansion Model—a sparse retrieval model that learns sparse representations for effective keyword matching
BioSyn: A biomedical entity representation model that uses synonym marginalization to link concepts
KRISSBERT: A BERT-based model generating knowledge-rich self-supervision for biomedical entity linking
In-context learning: A technique where a language model learns to perform a task from examples provided in the prompt without parameter updates
Ensemble retrieval: Using multiple retrieval methods (here, dense via SapBERT and sparse via SPLADE) simultaneously to capture different types of relevance
Knowledge Reservoir: A storage module in the framework that caches validated label-concept pairs to speed up future inference
Self-consistency prompting: Prompting the LLM multiple times with the same query and aggregating the results (e.g., via confidence scores) to improve reliability
Dense retrieval: Retrieval based on semantic vector similarity (embedding space)
Sparse retrieval: Retrieval based on keyword matching (lexical overlap)