MASEval: The proposed framework-agnostic evaluation library for multi-agent systems
smolagents: A minimalist, code-centric agent framework by Hugging Face
LangGraph: A graph-based agent orchestration framework by LangChain allowing explicit state management
LlamaIndex: A data framework for LLMs that supports agentic workflows and retrieval
MACS: Multi-Agent Coordination Survey—a benchmark testing multi-agent coordination on enterprise tasks
ConVerse: A benchmark measuring resistance to security attacks in agent-to-agent conversations
pGSR: Partial Goal Success Rate—a metric measuring the percentage of sub-goals successfully achieved
ASR: Attack Success Rate—the frequency with which an attacker agent successfully compromises a victim agent
GPT-5-mini: A hypothetical/future mid-tier model used in the paper's experiments
Gemini-3.0-Flash: A hypothetical/future mid-tier model used in the paper's experiments
Claude-Haiku-4.5: A hypothetical/future mid-tier model used in the paper's experiments
TS: Task Score—a metric used in MultiAgentBench to quantify bargaining outcomes
Item Response Theory: A statistical paradigm used in the paper's AdaptiveTaskQueue to estimate skill from a subset of items
Orchestration logic: The code and rules governing how multiple agents interact, pass messages, and take turns