← Back to Paper List

Real-Time Trust Verification for Safe Agentic Actions using TrustBench

Tavishi Sharma, Vinayak Sharma, Pragya Sharma
arXiv (2026)
Agent Factuality Benchmark

📝 Paper Summary

Agentic AI Safety Trust Calibration Runtime Verification
TrustBench safeguards autonomous agents by intercepting actions before execution and verifying them against calibrated trust scores and domain-specific policies like citation integrity.
Core Problem
Current trust frameworks evaluate agents post-hoc (after actions occur), failing to prevent harmful outcomes in high-stakes domains like healthcare and finance where errors are irreversible.
Why it matters:
  • Reactive 'evaluate after failure' paradigms are dangerous when agents execute financial transactions or medical advice directly
  • Standard metrics like ROUGE fail to capture reasoning soundness in agentic tasks lacking deterministic ground truths
  • Generic safety filters miss domain-specific nuances, such as the need for PubMed citations in medical advice vs. regulatory compliance in finance
Concrete Example: A healthcare agent recommending a dangerous medication dosage would be flagged by current benchmarks only after the recommendation is delivered to the user. TrustBench intercepts this by detecting a 'confidence-evidence mismatch' or lack of valid citations before execution.
Key Novelty
Dual-Mode Epistemic Trust Verification
  • Combines offline benchmarking (to learn calibration curves mapping agent confidence to actual reliability) with online verification (real-time checks without ground truth)
  • Uses 'LLM-as-a-Judge' to evaluate reasoning quality (correctness, consistency) instead of just surface-level text overlap, creating a semantic basis for trust
Evaluation Highlights
  • Reduced harmful actions by 87% across healthcare, finance, and QnA tasks compared to unconstrained baselines
  • Domain-specific plugins achieved 35% greater harm reduction compared to generic verification policies
  • Maintained sub-200ms median end-to-end verification latency, enabling practical real-time deployment
Breakthrough Assessment
8/10
Significant shift from post-hoc evaluation to real-time intervention. The integration of isotonic calibration with runtime checks addresses a critical safety gap for autonomous agents.
×