← Back to Paper List

Robust, Observable, and Evolvable Agentic Systems Engineering: A Principled Framework Validated via the Fairy GUI Agent

Jiazheng Sun, Ruimeng Yang, Xu Han, Jiayang Niu, Mingxuan Li, Te Yang, Yongyong Lu, Xin Peng
Fudan University
arXiv (2025)
Agent Memory Benchmark

📝 Paper Summary

Agentic AI Software Engineering Mobile GUI Agents
Fairy is a mobile agent built on a new engineering framework that enforces runtime requirements rigor, architectural observability, and evolutionary memory to resolve the 'Promptware Crisis' of fragility and opacity.
Core Problem
Current agentic systems suffer from a 'Promptware Crisis' characterized by ad-hoc development, leading to non-determinism, black-box opacity, and a lack of mechanisms for learning from experience.
Why it matters:
  • Agents perform 'Blind Refinement' (guessing user intent) when instructions are ambiguous, undermining trust and reliability
  • Tightly coupled black-box architectures make debugging and maintaining non-deterministic LLM systems extremely difficult
  • Without formal memory consolidation, agents remain 'eternal novices,' repeating errors instead of evolving through experience
Concrete Example: When facing vague user instructions or missing information, existing agents (like those using ReAct) often speculate on intent to maintain execution flow. This leads to deviated trajectories, whereas the proposed RGR framework pauses to clarify 'Runtime Expectations' with the user.
Key Novelty
Agentic Engineering Framework (RGR + OCA + EMA)
  • Runtime Goal Refinement (RGR): Shifts requirements engineering to runtime, forcing the agent to distinguish between executable 'Requirements' and ambiguous 'Expectations' that need user scaffolding
  • Observable Cognitive Architecture (OCA): Replaces black-box prompts with a white-box architecture that decouples components and separates state from control for better debuggability
  • Evolutionary Memory Architecture (EMA): Implements an execution-evolution dual-loop that transforms ephemeral runtime execution traces into reusable long-term knowledge
Architecture
Architecture Figure Figure 1
The RGR-I goal refinement process showing how a Planning Engine decomposes user intent
Evaluation Highlights
  • +33.7% improvement in user requirement completion rate by Fairy on RealMobile-Eval compared to the best SoTA baseline
  • OCA (Observable Cognitive Architecture) significantly enhanced system maintainability in human-subject studies, reducing the time required for expert developers to extend the system
  • Empirical validation confirms RGR prevents intent deviation and EMA is crucial for long-term performance
Breakthrough Assessment
8/10
Addresses the critical lack of engineering rigor in Agentic AI. The framework provides structured solutions (RGR, OCA, EMA) to fundamental problems like non-determinism and opacity, with significant empirical gains.
×