SFT: Supervised Fine-Tuning—training a model on labeled examples (input-output pairs) to learn a specific behavior
RL: Reinforcement Learning—training an agent to maximize a reward signal by interacting with an environment
OOD: Out-of-Distribution—tasks or data that differ significantly from what the model saw during training (e.g., new tools or domains)
Grounding: Ensuring that an AI's outputs are based on verifiable facts or real execution traces rather than hallucination
Inverted Synthesis: The process of generating the answer/execution trace first and then deriving the question, ensuring solvability
Topic Collapse: A failure mode in synthetic data generation where the model repeatedly produces the same few high-frequency concepts
Trajectory: The sequence of thoughts, actions (tool calls), and observations (tool outputs) an agent generates while solving a task