MoE: Mixture of Experts—a neural network architecture that activates only a subset of specialized sub-networks (experts) per input to save compute
CoT: Chain of Thought—a prompting or training method where the model generates intermediate reasoning steps before the final answer
GRPO: Group Relative Policy Optimization—a reinforcement learning algorithm used to guide the model's reasoning improvements by evaluating groups of outputs
USMLE: United States Medical Licensing Examination—a standardized three-step examination for medical licensure in the U.S.
RLHF: Reinforcement Learning from Human Feedback—a technique to align model behavior with human preferences using reward signals
SFT: Supervised Fine-Tuning—training a model on labeled datasets to learn specific task behaviors before RL
Prompt Injection: A security attack where malicious inputs manipulate the model into ignoring its safety constraints
Hallucination: When an AI generates plausible-sounding but factually incorrect or fabricated information
MLA: Multihead Latent Attention—an attention mechanism designed to handle long-range dependencies efficiently
Self-reflection: The model's ability to critique and revise its own reasoning steps during the generation process