← Back to Paper List

Exploring LLM-Based Agents for Root Cause Analysis

Devjeet Roy, Xuchao Zhang, Rashi Bhave, Chetan Bansal, P. Las-Casas, Rodrigo Fonseca, S. Rajmohan
Washington State University, Microsoft
SIGSOFT FSE Companion (2024)
Agent RAG Factuality Reasoning Benchmark

📝 Paper Summary

Agentic RAG pipeline Multi-call tool use with flexible plan
This paper evaluates a ReAct-based LLM agent for cloud incident root cause analysis, demonstrating that agents can perform competitively without fine-tuning by dynamically retrieving historical incidents and diagnostic data.
Core Problem
Automating Root Cause Analysis (RCA) is difficult because static LLMs cannot dynamically query external diagnostic data (logs, metrics) or specialized team knowledge required to diagnose complex cloud incidents.
Why it matters:
  • Cloud incident management is labor-intensive; manual RCA requires deep domain expertise and significant time from On-Call Engineers (OCEs).
  • Prior automated methods (fine-tuned LLMs or classification-based Copilots) lack the agency to fetch new, real-time diagnostic information, limiting their ability to solve novel incidents.
Concrete Example: An incident involving a service failure might require checking specific logs to find a 'connection refused' error. A standard LLM sees only the vague incident report, whereas an agent can decide to query the log database, find the error, and deduce the root cause.
Key Novelty
ReAct Agent for Zero-Shot RCA
  • Adapts the ReAct (Reasoning + Acting) framework to the RCA domain, allowing an LLM to interleave reasoning thoughts with tool execution (e.g., retrieving historical incidents or querying logs).
  • Evaluates the agent in a 'restricted' setting (static dataset) to establish baselines and a 'case study' setting (real production environment) to demonstrate the value of dynamic tool usage.
Evaluation Highlights
  • ReAct agent achieves 64.3% top-5 accuracy on the standard evaluation set, outperforming the RAG baseline (60.6%) while using significantly fewer retrieval tokens.
  • The agent reduces the number of retrieved documents needed: ReAct uses average 5.7 documents vs 10 for baselines, yet achieves higher accuracy.
  • In a qualitative analysis, ReAct effectively utilized specific retrieval queries (e.g., searching for specific error codes) to narrow down root causes where standard similarity search failed.
Breakthrough Assessment
7/10
Solid empirical evaluation of agents in a high-value industrial domain. While not proposing a new architecture, it provides the first rigorous study of ReAct for RCA without fine-tuning, highlighting practical deployment challenges.
×