AgentGuard: Runtime Verification of AI Agents

📝 Paper Summary

Agentic AI Safety Formal Verification of AI Agents Runtime Verification

AgentGuard is a middleware framework that continuously learns a probabilistic model of an AI agent's behavior at runtime to verify quantitative safety and liveness properties in real-time.

Core Problem

Autonomous AI agents exhibit non-deterministic, emergent behaviors and stochastic failures (e.g., hallucinations, loops) that traditional static verification methods cannot predict or quantify.

Why it matters:

Agentic systems in critical sectors (e.g., finance, hardware verification) face high risks due to unpredictability and susceptibility to new vulnerabilities like prompt injection.
Existing verification focuses on static output checking or process conformance, failing to map the emergent probabilistic behavior that arises during execution.
The critical question shifts from 'Will the system fail?' (binary) to 'What is the probability of failure within a given budget?' (probabilistic), which current tools don't answer.

Concrete Example: In automated program repair, an agent might get stuck in a loop of hypothesizing and searching without ever writing a fix. A static check sees valid tool calls, but fails to realize the probability of success effectively drops to zero as the agent burns through its budget.

Key Novelty

Dynamic Probabilistic Assurance via Digital Twins

Treats the agent as a black box and observes its Input/Output (I/O) to dynamically build a 'digital twin' modeled as a Markov Decision Process (MDP).
Uses online learning to update transition probabilities in the MDP based on observed frequencies of agent actions and tool outcomes.
Applies probabilistic model checking (PMC) on this evolving model to calculate real-time guarantees, such as the probability of reaching a success state.

Evaluation Highlights

Demonstrates the ability to learn execution patterns, such as a 75% probability of using 'search_code_base' vs 25% for 'find_similar_api_calls' after hypothesizing.
Enables calculation of quantitative properties like expected cycles to completion (E_min) to detect inefficiencies or loops.
successfully integrates as a middleware layer into RepairAgent, an existing autonomous bug-fixing system.

Breakthrough Assessment

7/10

Proposes a novel, practical middleware approach to runtime verification for agents. While the POC is demonstrated, the paper lacks extensive large-scale empirical benchmarks compared to standard ML papers.

⚙️ Technical Details

Problem Definition

Setting: Runtime verification of autonomous agents operating in non-stationary environments.

Inputs: Stream of raw Agent I/O (LLM calls, tool invocations, observations).

Outputs: Quantitative assurance metrics (e.g., probability of success P_max, expected cost E_min) and alerts based on safety thresholds.

Pipeline Flow

Agent executes actions → Trace Monitor captures I/O
Event Abstractor converts I/O to formal State/Action events
Online Model Learner updates MDP transition probabilities
Probabilistic Model Checker verifies PCTL properties against MDP
Dashboard/Actuator reports metrics or triggers intervention

System Modules

Trace Monitor & Event Abstractor

Instruments the agent framework to capture raw I/O and abstract it into formal events (State_A -> Action_1 -> State_B).

Model or implementation: Python-based logging middleware

Online Model Learner

Continuously updates the Agentic MDP (AMDP) structure and transition probabilities based on observed event frequencies.

Model or implementation: Frequency-based online learner

Probabilistic Model Checker

Performs quantitative verification on the learned MDP against pre-defined properties.

Model or implementation: Storm Model Checker (via stormpy bindings)

Dashboard / Actuator

Visualizes results and triggers automated responses if safety thresholds are breached.

Model or implementation: User Interface / Callback system

Novel Architectural Elements

Integration of an online MDP learner directly with a probabilistic model checker (Storm) in a middleware loop for agent control.
Abstraction layer that maps unstructured LLM tool usage to formal MDP states/actions in real-time.

Modeling

Base Model: Storm (Model Checker)

Comparison to Prior Work

vs. VeriPlan: AgentGuard operates at runtime with online learning, whereas VeriPlan verifies static plans post-generation.
vs. Formal-LLM: Formal-LLM enforces structural validity (conformance) using automata, while AgentGuard analyzes probabilistic behavior and emergent risks (e.g., loops, failure probability).
vs. Saarthi: Saarthi uses agents to perform verification tasks; AgentGuard uses formal methods to verify the agents themselves.
+ 1 more
vs. Monitor-LLM [not cited in paper]: Monitor-LLM uses a separate LLM to check traces against natural language requirements, whereas AgentGuard builds a formal mathematical model (MDP) for quantitative guarantees.

Limitations

Relies on developers to manually define the discrete state space and mapping logic.
Periodic re-verification of the entire model can introduce computational overhead for complex agents.
Currently assumes a fully observable state space (MDP), whereas real-world agents often face partial observability (POMDP).
Does not yet support stochastic games for analyzing multi-agent adversarial interactions.

Reproducibility

Code: https://github.com/rohamko/agentguard

The code is publicly available at https://github.com/rohamko/agentguard. The framework relies on the 'stormpy' Python bindings for the Storm model checker. The proof-of-concept is applied to RepairAgent, and a configuration for this is included.

📊 Experiments & Results

Evaluation Setup

Proof-of-Concept application to RepairAgent (an automated program repair agent) to demonstrate feasibility.

Benchmarks:

RepairAgent Case Study (Automated Program Repair (APR)) [New]

Metrics:

Probability of success (P_max)
Expected cycles to completion (E_min)
Transition probabilities (learned behavior)
Statistical methodology: Not explicitly reported in the paper

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
The paper provides a Proof-of-Concept demonstration rather than a comparative benchmark study. The primary results are the specific behavioral probabilities learned from the RepairAgent integration.
RepairAgent Case Study	Transition Probability	Not applicable	0.75	Not applicable
RepairAgent Case Study	Transition Probability	Not applicable	0.25	Not applicable

Main Takeaways

Demonstrates that agent behaviors (like repair strategies) can be mapped to formal MDPs in real-time.
Shows that quantitative properties (probability of success, expected cost) can be calculated dynamically to guide resource allocation.
Highlights the potential for detecting infinite loops or aimless exploration by monitoring the 'Expected cycles to completion' metric.

📚 Prerequisite Knowledge

Prerequisites

Markov Decision Processes (MDPs)
Probabilistic Model Checking (PMC)
Linear Temporal Logic / PCTL
Agentic AI workflows (Tool use, ReAct loops)

Key Terms

AMDP: Agentic Markov Decision Process—an MDP where states are snapshots of agent context and actions correspond to tool invocations.

PCTL: Probabilistic Computation Tree Logic—a logic language used to state properties like 'what is the probability that event X happens within K steps?'.

Runtime Verification (RV): A method of analyzing a system by observing its actual execution trace rather than proving properties about all possible executions.

Model Drift: Changes in the underlying statistical properties of the agent's behavior or environment over time.

Digital Twin: A dynamic virtual model (here, an MDP) that replicates the behavior of the physical/software system in real-time.

Storm: A specific probabilistic model checker tool used to verify properties of MDPs.

Hallucination: When an LLM generates outputs that are factually incorrect or logically flawed.

Conformance Checking: Verifying that a system's execution adheres to a pre-defined set of allowed rules or processes.