PPO: Proximal Policy Optimization—an RL algorithm that updates policies with a clipped objective to ensure stability
GRPO: Group Relative Policy Optimization—an RL method that normalizes advantages within a group of sampled outputs, avoiding the need for a separate value function
RAG: Retrieval-Augmented Generation—fetching relevant external data to augment the input prompt for an LLM
Memory Distillation: The process where the Answer Agent filters retrieved memories to select only the most relevant entries before generating an answer
LoCoMo: A benchmark for Long-Context Modeling evaluating agents on temporally distant conversational history
CRUD: Create, Read, Update, Delete—standard database operations adapted here for memory management
Exact Match (EM): A metric measuring if the generated answer is character-for-character identical to the ground truth
LLM-as-a-Judge: Using a strong LLM (like GPT-4) to evaluate the semantic correctness and quality of model outputs