Test-time learning: The ability of a model to learn and adapt its behavior during the inference phase (deployment) without updating its permanent weights
ReAct: Reason+Act—a paradigm where LLMs generate reasoning traces before executing actions
Experience Reuse: The ability to abstract and apply successful strategies from past tasks to new, similar problems, distinct from simply recalling facts
RAG: Retrieval-Augmented Generation—fetching relevant data from external storage to ground LLM generation
MDP: Markov Decision Process—a mathematical framework for modeling decision making where outcomes are partly random and partly under the control of a decision maker
ReMem: The authors' proposed agent framework that integrates reasoning, acting, and memory refinement into a single decision loop
ExpRAG: The authors' baseline method that retrieves and aggregates past experiences using in-context learning