Flow-GRPO: Flow-based Group Refined Policy Optimization—the proposed on-policy algorithm that assigns trajectory-level rewards to single-turn planner updates
In-the-flow: Optimization occurring within the active execution loop of the agent, rather than on static, offline data
Agentic system: A system composed of specialized modules (planner, executor, etc.) that collaborate to solve tasks, as opposed to a single monolithic model
GRPO: Group Relative Policy Optimization—an RL algorithm that normalizes advantages within a group of sampled outputs to reduce variance
PPO: Proximal Policy Optimization—a policy gradient method for reinforcement learning that constrains updates to ensure stability
MDP: Markov Decision Process—a mathematical framework for modeling decision making in situations where outcomes are partly random and partly under the control of a decision maker
Trajectory: The sequence of states, actions, and observations generated by the agent from the start of a task to its completion