← Back to Paper List

Execution Is the New Attack Surface: Survivability-Aware Agentic Crypto Trading with OpenClaw-Style Local Executors

Ailiya Borjigin, Igor Stadnyk, Ben Bilski, Serhii Hovorov, Sofiia Pidturkina
arXiv (2026)
Agent Benchmark

📝 Paper Summary

AI Safety for Agents Agentic Crypto Trading Execution Control
Survivability-Aware Execution (SAE) is a middleware layer that enforces non-bypassable safety constraints and exposure budgets on agentic trading systems to prevent execution-induced losses from untrusted intents or compromised skills.
Core Problem
In agentic trading, LLM 'wrong answers' or compromised third-party skills translate directly into irreversible financial losses (execution-induced loss), yet most systems lack explicit execution-layer safety boundaries.
Why it matters:
  • Real-world side effects are monetized in finance; a single hallucinated or injected command can liquidate an account
  • The rise of skill marketplaces (e.g., skills.sh) creates a capability supply chain where malware or malicious instructions can be imported directly into privileged agents
  • Existing OMS (Order Management Systems) focus on static compliance, lacking the context-aware, trust-conditioned tightening needed for non-deterministic AI agents
Concrete Example: A trading agent using an imported skill might be tricked by a prompt injection to request 50x leverage on a volatile asset. Without SAE, this executes and likely leads to liquidation. With SAE, the request is intercepted and projected (clamped) to a safe limit (e.g., 2x) defined in the Intended Policy Spec.
Key Novelty
Survivability-Aware Execution (SAE) Middleware
  • Treats all upstream agent outputs (from LLMs or skills) as 'untrusted intent' rather than executable commands
  • Interposes a strict execution contract between the strategy and the exchange that enforces hard budgets (Projection-based Exposure Budgeting) and allows/denies actions based on a measurable 'Delegation Gap'
  • Dynamically tightens constraints based on a 'trust state' (provenance of skills, injection alerts) and market regimes
Evaluation Highlights
  • Reduces Maximum Drawdown (MDD) by 93.1% (from 46.43% to 3.19%) in a Binance USD-M replay relative to a NoSAE baseline
  • Shrinks tail-risk magnitude (CVaR 0.99) by ~97.5%, effectively neutralizing catastrophic execution failures during stress periods
  • Reduces Delegation Gap (DG) loss proxy from 0.647 to 0.019 (~97% reduction) while maintaining zero False Block rate in the reported run
Breakthrough Assessment
8/10
Addresses a critical, under-explored safety gap in autonomous agents (execution vs. intent). The shift from 'safety as alignment' to 'safety as execution boundaries' is highly relevant for production deployment.
×