A1: Tool Execution Signaled Agent Adaptation—optimizing the agent using verifiable feedback from tools (e.g., code compilation success)
A2: Agent Output Signaled Agent Adaptation—optimizing the agent based on the quality of its final reasoning or answer (e.g., preference optimization on reasoning traces)
T1: Agent-Agnostic Tool Adaptation—training tools independently of the specific agent (e.g., pre-training a dense retriever on general corpus)
T2: Agent-Supervised Tool Adaptation—tuning tools using feedback from the specific frozen agent's performance (e.g., rewarding a retriever if the agent answers correctly)
PEFT: Parameter-Efficient Fine-Tuning—adapting models by updating only a small set of parameters (like adapters) rather than the full model
SFT: Supervised Fine-Tuning—training a model on labeled examples of desired behavior
DPO: Direct Preference Optimization—aligning models to preferences by optimizing on ranked pairs of outputs
RAG: Retrieval-Augmented Generation—systems that retrieve documents to ground generation
MCP: Model Context Protocol—standardized way for agents to interface with external tools and data
ReAct: Reasoning + Acting—a prompting paradigm where agents generate reasoning traces before executing actions