DRKG: Drug Repurposing Knowledge Graph—a comprehensive biological knowledge graph linking genes, compounds, diseases, and side effects
RAG: Retrieval-Augmented Generation—enhancing model responses by fetching relevant documents from an external corpus
Knowledge Distillation: A training method where a smaller 'student' model learns to mimic the outputs (predictions and rationales) of a larger 'teacher' model
GNN: Graph Neural Network—a neural network designed to process data represented as graphs, used here to calculate drug embeddings
Hard Negative: An irrelevant drug candidate selected because its embedding is similar to the relevant drug, making the choice challenging for the model
Teacher Model: A powerful LLM used to generate ground-truth rationales and selections to supervise the training of the smaller local model
Bi-encoder: A retrieval architecture where query and document are embedded independently into vectors to calculate similarity