A Survey on Agentic Security: Applications, Threats and Defenses

📝 Paper Summary

Agentic AI Security Adversarial Attacks on Agents Agent Defense Mechanisms

This survey structures the fragmented landscape of over 160 agentic security papers into three pillars—applications, threats, and defenses—revealing critical gaps like the monopoly of GPT backbones and underexplored modalities.

Core Problem

The rapid transition from passive LLMs to autonomous agents has introduced severe new vulnerabilities (e.g., indirect injection, goal hijacking) that existing safety measures for standalone models cannot address.

Why it matters:

Agents now execute actions in external environments, meaning attacks can cause tangible damage (e.g., executing malware, leaking private data) rather than just generating bad text
Current research is fragmented into isolated clusters (red teaming, governance, jailbreaking) without a unified framework connecting capabilities to their specific vulnerabilities and defenses
Standard LLM safety alignment (refusal training) does not reliably transfer to agentic contexts, leaving critical infrastructure vulnerable to simple jailbreaks

Concrete Example: A 'split-payload injection' attack can compromise an agent simply by embedding malicious instructions across different parts of a website. When the agent processes the site, it combines these parts and executes the payload, a vulnerability specific to the agent's information aggregation process that standalone LLMs don't face.

Key Novelty

Three-Pillar Taxonomy of Agentic Security

Structures the field into Applications (Red/Blue Teaming), Threats (Injection, Poisoning, Manipulation), and Defenses (Secure-by-Design, Runtime Protection)
Provides a cross-cutting analysis identifying structural trends, such as the shift from monolithic to planner-executor architectures and the risky monopoly of closed-source backbones (GPT-4)

Architecture

A comprehensive taxonomy tree of Agentic Security, organized into three main branches: Applications, Threats, and Defenses

Evaluation Highlights

Identifies that 83% of surveyed studies rely on GPT-family models, creating a dangerous single-point-of-failure risk for the ecosystem
reveals that planner-executor architectures (39.8%) and hybrid models (14%) are displacing monolithic agents, introducing new modular attack surfaces
Highlights that defense mechanisms are currently fragile; adversarial training often degrades task utility, and simple jailbreaks remain effective against complex agents

Breakthrough Assessment

9/10

The first holistic survey covering the entire agentic security lifecycle. It unifies scattered literature into a coherent framework, crucial for defining future research directions in this rapidly emerging field.

⚙️ Technical Details

Problem Definition

Setting: Systematization of Knowledge (SoK) regarding the security of LLM-based Agents

Inputs: Literature corpus of over 160 papers from 2024-2025

Outputs: Taxonomy, trend analysis, and gap identification for Agentic Security

Pipeline Flow

Applications (Offensive, Defensive, Domain-Specific)
Threats (Attack Surface, Evaluation Frameworks)
Defenses (Hardening, Operations, Evaluation)

System Modules

Applications Analysis

Categorize how agents are used in cybersecurity

Model or implementation: N/A (Survey)

Threats Analysis

Categorize vulnerabilities inherent to agents

Model or implementation: N/A (Survey)

Defenses Analysis

Categorize countermeasures and hardening techniques

Model or implementation: N/A (Survey)

Novel Architectural Elements

Holistic three-pillar taxonomy linking Applications, Threats, and Defenses specifically for Agents (vs. generic LLMs)
Cross-cutting analysis framework that correlates Architecture (Monolithic vs. Modular) with Security Risks

Modeling

Base Model: N/A (Survey Paper)

Comparison to Prior Work

vs. Yu et al. (2025): Covers Applications extensively (Red/Blue teaming) which Yu et al. omit
vs. Raza et al. (2025): Provides deep technical coverage of threats and specific defenses, whereas Raza focuses on high-level governance
vs. Deng et al. (2024c): Covers Defenses and Applications in addition to Threats, providing a holistic view
+ 1 more
vs. Ma et al. (2025): Focuses specifically on the unique attack surface of Agents (tools, memory, planning) rather than general LLM safety

Limitations

Does not explore physical-world or embodied agent attacks (robots, sensors) in detail
Limited to academic papers; may miss proprietary industrial research
Benchmarks reviewed often use synthetic setups, limiting real-world applicability assessment
Most surveyed studies ignore practical constraints like cost, latency, and energy usage

Reproducibility

Code: https://github.com/kagnlp/Awesome-Agentic-Security

The authors provide a continuously updated list of all 160+ surveyed papers at https://github.com/kagnlp/Awesome-Agentic-Security.

📊 Experiments & Results

Evaluation Setup

Cross-cutting statistical analysis of 160+ papers

Metrics:

Distribution of Agent Architectures
Prevalence of LLM Backbones
Usage of Input Modalities
Adoption of Knowledge Sources
Statistical methodology: Not explicitly reported in the paper

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
Analysis of architectural trends reveals a shift away from monolithic designs.
Surveyed Papers	Percentage	24.6	39.8	15.2
Analysis of model backbones highlights a significant concentration risk.
Surveyed Papers	Number of Papers	71	126	55

Experiment Figures

Statistical charts showing the distribution of Architectures, Roles, LLM Backbones, Knowledge Sources, and Data Modalities across the surveyed papers

Main Takeaways

The field is rapidly moving toward modular 'Planner-Executor' and Hybrid agent architectures to improve control and debuggability
There is a dangerous monopoly of GPT-based models; open-weights models (Llama, Mistral) are significantly underrepresented in security-critical agent research
Non-textual modalities (images, binaries, network traces) are severely underexplored despite their importance in real-world security contexts like malware analysis
Most systems rely on pre-trained knowledge bases (RAG/ICL) rather than fine-tuning, prioritizing deployment speed over the robustness of internalized security knowledge

📚 Prerequisite Knowledge

Prerequisites

Understanding of Large Language Models (LLMs) and their limitations
Basic cybersecurity concepts (Red Teaming, Blue Teaming, Injection, Fuzzing)
Familiarity with agent architectures (Planning, Tool use, Memory)

Key Terms

LLM Agent: A system where an LLM acts as a decision maker to plan, invoke tools, and act in an environment while maintaining state

Red Teaming: Offensive security testing where agents act as attackers to find vulnerabilities in systems

Blue Teaming: Defensive security operations where agents monitor, detect, and respond to threats

Prompt Injection: Attacks that embed malicious instructions in the input to manipulate the model's behavior

Indirect Prompt Injection: Attacks where the agent consumes malicious content from an external source (e.g., a webpage) rather than a direct user prompt

Jailbreak: Techniques to bypass a model's safety alignment and refusal training

RAG: Retrieval-Augmented Generation—fetching external data to ground the model's responses

CoT: Chain-of-Thought—a prompting technique where the model generates intermediate reasoning steps

Goal Hijacking: Attacks that alter the agent's primary objective to serve a malicious secondary goal

Reward Hacking: Exploiting flaws in a reinforcement learning reward function to maximize score without achieving the intended outcome