MM-LLMs: Multi-modal Large Language Models—AI models capable of processing and generating both text and images.
Agentic behavior: The ability of a model to autonomously decide to take external actions (like calling a tool or asking a sub-agent) before answering, rather than just generating text immediately.
Trajectory: The recorded sequence of thoughts, actions, and observations an agent takes to solve a problem.
Stratified supervision: A training strategy where data is organized by difficulty; easy samples teach direct answering, while hard samples teach complex tool use.
Frontier models: The most advanced, usually proprietary and closed-source, AI models available (e.g., GPT-4, Gemini Ultra).
SFT: Supervised Fine-Tuning—training a model on labeled examples.
Prospective trajectories: Traces recorded during the agent's actual forward attempt to solve a problem (exploratory).
Retrospective trajectories: Traces generated after the fact, rewriting the reasoning path to be cleaner and more logical based on the known outcome (hindsight).
Behavioral Cloning: A method where a student model learns to mimic the exact actions taken by a teacher model in a given situation.
OOD: Out-Of-Distribution—data or tasks that differ significantly from what the model was trained on.