GRPO: Group Relative Policy Optimization—an RL algorithm that updates policies based on the relative advantage of a group of outputs rather than a learned value function.
DDI: Drug-Drug Interaction—a situation where a substance affects the activity of a drug when both are administered together.
Jaccard Similarity: A statistic used for gauging the similarity and diversity of sample sets (intersection over union).
Potential-based Reward Shaping: A technique in RL where additional rewards are provided based on a potential function of the state to guide the agent without altering the optimal policy.
Point-wise prediction: Evaluating items (drugs) independently one by one, ignoring the context of other selected items.
List-wise prediction: Generating or evaluating an entire ordered list of items together, capturing dependencies between them.
MIMIC-III: Medical Information Mart for Intensive Care III—a widely used dataset of de-identified health data.
eICU: eICU Collaborative Research Database—a multi-center critical care dataset.