← Back to Paper List

AgentGuard: Runtime Verification of AI Agents

Roham Koohestani
arXiv (2025)
Agent Reasoning RL

📝 Paper Summary

Agentic AI Safety Formal Verification of AI Agents Runtime Verification
AgentGuard is a middleware framework that continuously learns a probabilistic model of an AI agent's behavior at runtime to verify quantitative safety and liveness properties in real-time.
Core Problem
Autonomous AI agents exhibit non-deterministic, emergent behaviors and stochastic failures (e.g., hallucinations, loops) that traditional static verification methods cannot predict or quantify.
Why it matters:
  • Agentic systems in critical sectors (e.g., finance, hardware verification) face high risks due to unpredictability and susceptibility to new vulnerabilities like prompt injection.
  • Existing verification focuses on static output checking or process conformance, failing to map the emergent probabilistic behavior that arises during execution.
  • The critical question shifts from 'Will the system fail?' (binary) to 'What is the probability of failure within a given budget?' (probabilistic), which current tools don't answer.
Concrete Example: In automated program repair, an agent might get stuck in a loop of hypothesizing and searching without ever writing a fix. A static check sees valid tool calls, but fails to realize the probability of success effectively drops to zero as the agent burns through its budget.
Key Novelty
Dynamic Probabilistic Assurance via Digital Twins
  • Treats the agent as a black box and observes its Input/Output (I/O) to dynamically build a 'digital twin' modeled as a Markov Decision Process (MDP).
  • Uses online learning to update transition probabilities in the MDP based on observed frequencies of agent actions and tool outcomes.
  • Applies probabilistic model checking (PMC) on this evolving model to calculate real-time guarantees, such as the probability of reaching a success state.
Evaluation Highlights
  • Demonstrates the ability to learn execution patterns, such as a 75% probability of using 'search_code_base' vs 25% for 'find_similar_api_calls' after hypothesizing.
  • Enables calculation of quantitative properties like expected cycles to completion (E_min) to detect inefficiencies or loops.
  • successfully integrates as a middleware layer into RepairAgent, an existing autonomous bug-fixing system.
Breakthrough Assessment
7/10
Proposes a novel, practical middleware approach to runtime verification for agents. While the POC is demonstrated, the paper lacks extensive large-scale empirical benchmarks compared to standard ML papers.
×