← Back to Paper List

STRIDE: A Systematic Framework for Selecting AI Modalities -- Agentic AI, AI Assistants, or LLM Calls

Shubhi Asthana, Bing Zhang, Chad DeLuca, Ruchi Mahindru, Hima Patel
IBM Research
arXiv (2025)
Agent Reasoning Benchmark

📝 Paper Summary

Agentic Architecture Design Task Complexity Analysis
STRIDE is a design-time framework that systematically analyzes task complexity and dynamism to recommend whether a problem requires a simple LLM call, a guided assistant, or a fully autonomous agent.
Core Problem
Organizations currently deploy expensive, risky autonomous agents indiscriminately for tasks that could be solved by simpler methods, leading to over-engineering and governance issues.
Why it matters:
  • Overusing agents wastes compute resources and increases latency for simple queries.
  • Unnecessary agent autonomy introduces security risks like data leaks and system instability (e.g., recursive loops).
  • There is a lack of principled, evidence-based frameworks for deciding 'necessity' at design time; most choices are intuition-driven.
Concrete Example: A task like 'Generate a random greeting message' has output variability due to model stochasticity, but does not require an agent. Current approaches might mistake this variability for complexity and deploy an agent, whereas STRIDE identifies it as model-induced and recommends a stateless LLM call.
Key Novelty
Systematic Task Reasoning Intelligence Deployment Evaluator (STRIDE)
  • A 'shift-left' decision framework that operates at design time rather than deployment time, preventing over-engineering before code is written.
  • Introduces a 'True Dynamism Score' that distinguishes between variability caused by the model (randomness), tools (API volatility), and the workflow itself (conditional branching)—only the latter justifies full agents.
  • Calculates an Agentic Suitability Score (ASS) based on reasoning depth, tool needs, state requirements, and self-reflection necessity.
Architecture
Architecture Figure Figure 1
The STRIDE workflow pipeline illustrating the process from input to recommendation.
Evaluation Highlights
  • Achieved 92% accuracy in modality selection across 30 real-world tasks in SRE and compliance domains.
  • Reduced unnecessary agent deployments by 45% compared to baseline intuition-driven choices.
  • Cut resource costs by 37% by routing simpler tasks to less expensive modalities.
Breakthrough Assessment
7/10
Significant practical contribution for enterprise AI adoption. While not a new model architecture, it provides a much-needed formal methodology for architectural decision-making, addressing the 'agent bloat' problem effectively.
×