Atomic Thought: The minimal, functionally coherent unit of reasoning (e.g., planning, reflection) encapsulated in tags, used to structure the agent's thinking process
RRM: Reasoning Reward Model—a model used to score the quality of intermediate reasoning steps (Atomic Thoughts) rather than just the final answer
ATR: Atomic Thought Reward—the fine-grained reward signal derived from the RRM scoring of atomic thoughts
GRPO: Group Relative Policy Optimization—a reinforcement learning algorithm that optimizes policies based on group-relative advantages, often used for reasoning tasks
Gradient Conflict: A phenomenon in RL where the gradient updates from different parts of a trajectory (e.g., good reasoning vs. bad outcome) oppose each other, hindering learning
Agentic Deep Research: An autonomous search paradigm where LLMs perform reasoning, on-demand searching, and iterative information synthesis to answer complex questions
SFT: Supervised Fine-Tuning—training a model on labeled examples to initialize its behavior before applying reinforcement learning