M-MDP: Memory-augmented Markov Decision Process—an extension of MDPs where the state space includes an evolving external memory of past experiences
CBR: Case-Based Reasoning—a problem-solving paradigm that solves new problems by retrieving and adapting solutions from similar past problems
Pass@3: A metric measuring the probability that at least one of the top 3 generated solutions is correct
PM: Process Match—a metric likely measuring how closely the agent's execution path aligns with a reference or desired workflow (specific definition not fully elaborated in text snippet but implied as a performance metric)
MCP: Model Context Protocol—a standardized interface for connecting AI models to external tools and data sources
Soft Q-Learning: An RL algorithm that maximizes both the expected reward and the entropy of the policy, encouraging exploration and robustness
Episodic Control: A learning method that rapidly estimates values (Q-values) based on highly similar past events stored in memory, rather than slow gradient updates
TD learning: Temporal Difference learning—an RL method that updates estimates based on other learned estimates, bootstrapping from the future to the present