Unified Protocol: A standard interface defining Task, Context, and Actions to mediate between diverse agents and benchmarks
MCP: Model Context Protocol—a standard for connecting AI assistants to systems and data
Scaffold: The software engineering framework wrapping an LLM to enable agentic behaviors (memory, tool use, planning)
Exgentic: The proposed evaluation harness that implements the Unified Protocol
Tool shortlisting: A technique to filter the available action space to a manageable subset for the LLM
Zero-shot generalization: The ability of an agent to perform tasks in an unseen environment without domain-specific fine-tuning or prompt engineering
ReAct: Reason+Act—a prompting paradigm where models generate reasoning traces before executing actions
τ-Bench: A benchmark evaluating customer service agents in retail/airline domains, focusing on policy compliance
SWE-Bench Verified: A subset of SWE-Bench containing human-validated software engineering tasks (bug fixes)
BrowserGym: A framework consolidating web-based agent benchmarks
AppWorld: A benchmark for day-to-day digital user-assistance tasks involving multiple apps