The Attack and Defense Landscape of Agentic AI: A Comprehensive Survey

📝 Paper Summary

AI Agent Security Systematization of Knowledge (SoK)

This paper establishes the first comprehensive security framework for Agentic AI, categorizing risks through seven design dimensions (such as input trust and tool access) rather than just isolated model vulnerabilities.

Core Problem

Current security research focuses on isolated components (e.g., jailbreaking standalone LLMs) and fails to address the complex system-level risks introduced by agents' flexible integration of tools, memory, and autonomy.

Why it matters:

Agents now control sensitive operations like banking and coding, making vulnerabilities physically and financially consequential
The combination of unconstrained data flow and autonomous tool execution creates attack surfaces fundamentally different from traditional software or static models
Existing defenses for standalone models (like alignment) are insufficient against system-level threats like indirect injection via tool outputs

Concrete Example: An attacker registers a malicious software package with a name that LLMs frequently hallucinate. When a coding agent hallucinates this package name during a task, it automatically installs the malware, turning a model error into a full system compromise (Package Hallucination Attack).

Key Novelty

7-Dimension Agent Design Framework

Characterizes agents via seven continuous flexibility spectra: Input Trust, Workflow, Access Sensitivity, Action, Tool, Memory, and User Interface
Maps these dimensions to specific security risks, showing how increased flexibility (e.g., from 'read-only' to 'environment-modifying' actions) directly expands the attack surface
Categorizes threats into three distinct adversary models: External (environment manipulation), User-level (direct input), and Internal (model/memory poisoning)

Architecture

General structure of an AI Agent showing the interaction between the 'Brain' (LLM) and non-AI components

Evaluation Highlights

Systematized 128 papers from top-tier venues (2023–2025), identifying 51 distinct attack methods specific to agents
Identified and categorized 60 existing defense mechanisms applicable to agentic systems
Developed a unified taxonomy of 7 security risk categories (e.g., Heterogeneous Untrusted Interfaces, Unconstrained Data Flow) spanning the CIA triad

Breakthrough Assessment

9/10

Provides the foundational taxonomy for a rapidly emerging field. By shifting focus from 'model security' to 'agent system security', it defines the roadmap for future research.

⚙️ Technical Details

Problem Definition

Setting: Security analysis of hybrid software systems (Agents) that combine probabilistic AI models with deterministic software components

Inputs: User queries, External environment state (web, tools), Memory contents

Outputs: Actions (tool calls), Generated responses, State updates

Pipeline Flow

Planner (LLM decomposes task)
Actor (LLM executes steps)
Tool Execution (Interaction with Environment)
Memory (Storage/Retrieval)

System Modules

Planner (Orchestration)

Decompose user tasks into step-by-step plans

Model or implementation: LLM (Brain)

Actor (Orchestration)

Execute individual steps by invoking tools or querying memory

Model or implementation: LLM

Memory

Store internal knowledge and historical trajectories

Model or implementation: Vector Database / Key-Value Store

Tools

Interact with external environment (Read/Write)

Model or implementation: Traditional Software Functions / APIs

Novel Architectural Elements

The paper defines a generalized 'Agentic System' architecture to map security risks, treating the integration of 'Probabilistic Planner' + 'Deterministic Tools' + 'Vector Memory' as the core architectural novelty requiring specific defense.

Comparison to Prior Work

vs. OWASP Top 10 for LLM: This paper extends beyond model-centric risks to cover system-level agent interactions (tools, memory, workflow) [cited in paper]
vs. MITRE ATLAS: This paper provides a more granular taxonomy specifically for *agentic* workflows and design dimensions rather than general AI threats [cited in paper]
vs. Prior Agent Surveys (e.g., Zhang et al., 2025): This work covers the entire system lifecycle (attack & defense) rather than focusing only on specific vectors like prompt injection [cited in paper]

Limitations

Survey scope is limited to papers published between 2023 and October 2025
Excludes attacks targeting model internals (like model inversion) to focus on agent-specific risks
Effectiveness of defenses is based on reported results in literature, not new empirical verification in this specific text

Reproducibility

This is a survey paper. The methodology for paper selection (keywords, venues) is detailed in Section 2. The list of reviewed papers (128 total) serves as the dataset.

📊 Experiments & Results

Evaluation Setup

Systematic Literature Review (SLR) and Taxonomy Construction

Benchmarks:

Literature Corpus (Survey Analysis) [New]

Metrics:

Count of identified attack vectors
Count of defense mechanisms
Coverage of design dimensions
Statistical methodology: Not explicitly reported in the paper

Key Results

Benchmark	Metric	This Paper	Δ
Literature Corpus	Total Papers Reviewed	128	+128
Literature Corpus	Identified Attack Methods	51	+51
Literature Corpus	Identified Defense Methods	60	+60

Experiment Figures

The comprehensive attack landscape taxonomy

Main Takeaways

Agent security requires a fundamental shift from component-level defense (securing the LLM) to system-level defense (securing the workflow and tools)
The 'Action' and 'Tool' dimensions introduce the most critical risks, as they bridge the gap between model hallucination and real-world damage (e.g., data exfiltration)
Current defenses are fragmented; there is a significant gap in holistic frameworks that protect the entire agent lifecycle from input to tool execution

📚 Prerequisite Knowledge

Prerequisites

Understanding of Large Language Model (LLM) inference
Basic cybersecurity concepts (CIA triad, Injection attacks)
Familiarity with Agent architectures (Tools, Memory, Planning)

Key Terms

Agentic AI: Hybrid systems combining LLMs with non-AI components (tools, memory) to autonomously execute tasks

Indirect Prompt Injection: Attacks where malicious instructions are embedded in external data (e.g., webpages) that the agent retrieves and processes

CIA Triad: Confidentiality, Integrity, and Availability—the three pillars of information security

MCP: Model Context Protocol—a standard for connecting AI assistants to systems and data

RAG: Retrieval-Augmented Generation—fetching external data to ground LLM responses

SSRF: Server-Side Request Forgery—a vulnerability where an attacker forces a server to make requests to internal resources

XSS: Cross-Site Scripting—injecting malicious scripts into trusted websites

PII: Personally Identifiable Information—sensitive data like names or financial records