Policy Compiler for Secure Agentic Systems

📝 Paper Summary

Agent Security Policy Enforcement Authorization

PCAS instruments agentic systems to enforce authorization policies deterministically by maintaining a fine-grained dependency graph of all actions and checking declarative rules before execution.

Core Problem

Current agent systems rely on prompt-based instructions for security, which offers no enforcement guarantees, while linear message logs fail to capture the causal dependencies needed for robust authorization.

Why it matters:

Prompt-based policies are ambiguous and easily bypassed by prompt injection or model error, leading to data exfiltration or unauthorized actions
Real-world authorization often depends on provenance (e.g., 'approve only if derived from X'), which linear logs obscure
Compliance violations in sectors like healthcare or customer service can have severe legal and safety consequences

Concrete Example: A policy states 'access medical records only after supervisor approval.' If an agent simply logs messages linearly, it might not track *which* specific approval causally preceded a request, or if the approval was faked via prompt injection. PCAS tracks the causal graph to ensure the approval event explicitly enables the access event.

Key Novelty

Policy Compiler for Agentic Systems (PCAS)

Compiles declarative policies (Datalog) and existing agent code into a secure system that intercepts every action via a reference monitor
Models the entire agent state (messages, tool calls, results) as a dependency graph rather than a linear log to track information flow and causal provenance
Enforces security deterministically outside the LLM, meaning the policy holds even if the LLM is compromised or hallucinates

Evaluation Highlights

Improves policy compliance from 48% to 93% on average across frontier models (Claude Opus 4.5, GPT-5.2, Gemini 3 Pro) in customer service tasks
Achieves 100% prevention of policy violations (0 violations allowed) in instrumented runs on customer service benchmarks
Effective against prompt injection: blocks unauthorized actions even when models are successfully manipulated by adversarial inputs

Breakthrough Assessment

8/10

Strong conceptual advance by moving security from 'prompt engineering' to deterministic runtime enforcement via dependency graphs. High impact for enterprise agent deployment.

⚙️ Technical Details

Problem Definition

Setting: Runtime enforcement of authorization policies in multi-agent systems

Inputs: Existing agent implementation + Declarative policy specification (Datalog-derived)

Outputs: Instrumented agentic system with a reference monitor that blocks violating actions

Pipeline Flow

Policy Compilation (translates Datalog rules to Rust)
Agent Instrumentation (hooks into agent framework to trap actions)
Runtime Execution (Agent proposes action -> Reference Monitor checks Graph -> Allow/Block)

System Modules

Policy Compiler

Translates high-level Datalog policy rules into executable code for the monitor

Model or implementation: Differential Datalog (DDlog)

Dependency Graph Tracker (Runtime Enforcement)

Maintains a dynamic graph of all agent events (messages, tool inputs/outputs) and their causal links

Model or implementation: Graph Database / In-memory Structure

Reference Monitor (Runtime Enforcement)

Intercepts every pending action, queries the policy engine against the dependency graph, and authorizes or blocks

Model or implementation: Deterministic Logic Engine

Novel Architectural Elements

Fine-grained dependency graph replacing linear message history as the basis for authorization decisions
Separation of policy logic (Datalog) from agent logic (LLM prompting), enforcing security via compilation rather than prompt engineering

Modeling

Base Model: Evaluated on Claude Opus 4.5, GPT-5.2, Gemini 3 Pro (Note: Paper uses future model names, likely hypothetical or renamed for anonymity/projection)

Compute: Not reported in the paper

Comparison to Prior Work

vs. TrustAgent/LlamaGuard: PCAS provides deterministic enforcement via reference monitor, whereas prompt/classifier methods are probabilistic and bypassable
vs. FIDES: PCAS supports general-purpose authorization policies (approval workflows, role-based access) via Datalog, not just taint tracking for prompt injection
vs. NeMo Guardrails: PCAS models causal dependencies across multi-agent history, whereas Colang focuses on linear dialog flow and single-turn constraints
+ 1 more
vs. Progent: PCAS tracks transitive provenance (history of data), whereas Progent checks only immediate tool call arguments

Limitations

Requires policies to be specifiable in Datalog, which may be difficult for non-technical users compared to natural language
Graph tracking adds overhead; performance impact on extremely long contexts/traces not fully characterized
Does not prevent model confusion or bad performance, only blocks unauthorized actions (liveness vs safety)
Does not explore statistical counterfactual causality, only explicit dependency causality

Reproducibility

Code: https://github.com/wi-pi/pcas

Code will be released at https://github.com/wi-pi/pcas. Evaluation uses standard benchmarks (AgentDojo, Tau-bench) and custom case studies. Specific prompt templates or model weights for the agents are not detailed in the main text.

📊 Experiments & Results

Evaluation Setup

Three case studies: Prompt Injection Defense, Pharmacovigilance Approval Workflow, and Customer Service Policy Compliance

Benchmarks:

Tau-bench (modified) (Customer service agent tasks)
Pharmacovigilance Workflow (Multi-agent approval simulation) [New]
Prompt Injection Scenarios (Adversarial input handling) [New]

Metrics:

Policy Compliance Rate (%)
Violation Rate
Statistical methodology: Not explicitly reported in the paper

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
Tau-bench (Customer Service)	Compliance Rate	48	93	+45
Tau-bench (Customer Service)	Policy Violations	Not explicitly reported in the paper	0	Not explicitly reported in the paper

Main Takeaways

PCAS consistently improves compliance rates (up to ~2.9x improvement) by blocking unauthorized actions that base models would otherwise execute.
The system provides 'defense in depth': even if the underlying LLM is jailbroken or confused, the reference monitor prevents the actual execution of the harmful tool call.
Dependency graph tracking successfully handles complex authorization logic like 'supervisor approval required' which linear logs fail to verify robustly.

📚 Prerequisite Knowledge

Prerequisites

Attribute-Based Access Control (ABAC)
Datalog (logic programming)
Information Flow Control (IFC)
LLM Agent Architectures (Tools, Planning)

Key Terms

PCAS: Policy Compiler for Agentic Systems—the proposed framework for compiling policies into runtime monitors

Datalog: A declarative logic programming language used here to define security rules and query the dependency graph

dependency graph: A data structure representing causal relationships between agent events (messages, tool calls) to track provenance

reference monitor: A trusted component that intercepts all system actions to check if they are authorized before execution

ABAC: Attribute-Based Access Control—an authorization model where access is granted based on attributes of the user, resource, and environment

provenance: The history of where a piece of data came from and how it was processed (e.g., which tool output influenced this decision)

prompt injection: An attack where adversarial instructions are hidden in input data to manipulate an LLM's behavior

linear message history: The standard way agents store context (a sequential list of messages), which PCAS argues is insufficient for security

Differential Datalog: An incremental computation engine for Datalog used to efficiently update and query the policy state

frontier models: The most advanced current LLMs (e.g., GPT-5.2, Claude Opus 4.5 as cited in paper)