← Back to Paper List

Process-Centric Analysis of Agentic Software Systems

Shuyang Liu, Yang Chen, Rahul Krishna, Saurabh Sinha, Jatin Ganhotra, Reyhan Jabbarvand
University of Illinois Urbana–Champaign, IBM Research
arXiv (2025)
Agent Benchmark Reasoning

📝 Paper Summary

Agentic software systems Agent evaluation methodologies Software engineering agents
The paper introduces Graphectory, a graph-based representation of agent trajectories, to analyze how agents solve problems rather than just if they succeed, enabling real-time detection and correction of inefficient strategies.
Core Problem
Current agent evaluation is outcome-centric (success/failure), masking the recurrent inefficiencies, chaotic behaviors, and lack of validation in trajectories that randomly lead to success or failure.
Why it matters:
  • Outcome-centric metrics fail to explain how agents reason, plan, or adapt strategies, preventing systematic improvements.
  • Agents often succeed by chance despite inefficient processes (e.g., editing files line-by-line vs. patches), which outcome metrics treat identically.
  • Without process visibility, it is difficult to distinguish systematic reasoning from stochastic luck or to intervene when agents get stuck in loops.
Concrete Example: Two agents fix the same bug (django-10973). SWE-agentDev succeeds but takes 15 steps with repetitive edits and weak validation. SWE-agentDSK-V3 also succeeds in 9 steps but skips validation entirely. Outcome metrics rate them equal (both 'Success'), hiding the risky no-validation strategy of the second agent.
Key Novelty
Graphectory and Langutory: Graph-based Trajectory Representation
  • Encodes linear agent logs into a graph (Graphectory) where nodes are actions and edges capture both temporal sequence and structural navigation (e.g., file hierarchy).
  • Abstracts this graph into a string sequence (Langutory) representing logical phases (Localization, Patching, Validation), enabling regex-like pattern mining for strategy analysis.
Architecture
Architecture Figure Figure 2
Visual comparison of Graphectory and Langutory for two agents (SWE-agentDev vs SWE-agentDSK-V3) solving the same issue.
Evaluation Highlights
  • Online monitoring with intervention improved resolution rates by 11.9% on average (up to 23.5%) across problematic instances.
  • Intervention repaired trajectories in 94.1% of consistent failure cases (86 instances), turning chaotic loops into valid workflows.
  • Analysis of 4000 trajectories reveals that stronger LLMs (e.g., Claude 4) use more complex structures (higher node/edge counts) reflecting deeper exploration.
Breakthrough Assessment
8/10
Significant shift from outcome-based to process-based evaluation. The Graphectory abstraction is a powerful tool for debugging agents, and the demonstrated ability to fix agents in real-time is highly impactful.
×