KV cache: Key-Value cache—storage of token representations in the Transformer's attention mechanism, which is limited in size and determines the working memory capacity
netting: The process of consolidating multiple financial transactions between parties to determine a final single settlement amount
RAG: Retrieval-Augmented Generation—systems that fetch relevant external data to ground LLM responses
memory organization hint: An informal text prompt provided in this benchmark that explains to the agent how a human would structure the information (e.g., 'maintain a ledger')
Mem0: A specific open-source agentic memory framework that manages memory creation and retrieval
state tracking: The ability to monitor entities as their properties change over time (e.g., location, debt), rendering previous states obsolete
LLM-as-a-judge: Using a strong Language Model to evaluate the correctness of open-ended responses from another model