GRPO: Group Relative Policy Optimization—a reinforcement learning algorithm that normalizes rewards within a group of sampled outputs to stabilize training
Semantic Neighborhood Modeling: Constructing a cluster of semantically similar queries to evaluate whether a memory update generalizes beyond the specific instance that generated it
Marginal Utility Reward: A reward function measuring the incremental benefit (success or efficiency) a memory update provides compared to a reference execution without that update
Mem-Optimizer: The trainable module in UMEM responsible for extracting insights from trajectories and deciding how to update the memory bank
Self-evolving Agent: An AI agent that improves its performance over time by updating its external memory or parameters based on experience
Online Memory Evolution: Updating the memory bank dynamically during the training process with the best-rated rollouts, forcing the agent to adapt to a changing memory state
CSR: Cumulative Success Rate—a metric tracking the total number of successful tasks over a sequence of interactions
Instance-Specific Noise: Details in a memory that are unique to one specific example and do not help (or even hurt) when applied to similar but different problems