← Back to Paper List

Automatic Database Configuration Debugging using Retrieval-Augmented Language Models

S Chen, J Fan, B Wu, N Tang, C Deng, P Wang, Y Li…
Renmin University of China
arXiv, 12/2024 (2024)
RAG QA

📝 Paper Summary

Modularized RAG pipeline
Andromeda is an LLM-based RAG framework that automates DBMS configuration debugging by retrieving and aligning heterogeneous data sources—historical questions, manuals, and telemetry logs—to generate precise knob tuning recommendations.
Core Problem
Database configuration debugging is tedious and requires deep expert knowledge; general-purpose LLMs lack specific domain knowledge, leading to generic, unhelpful advice.
Why it matters:
  • Poorly configured DBMSs suffer from severe performance degradation and runtime errors
  • Human DBAs are expensive and the process is time-consuming even for experts
  • Directly prompting LLMs yields 'technically correct' but vague answers (e.g., 'check your settings') rather than specific actionable values
Concrete Example: A user asks why an INSERT statement is slow. A standard LLM suggests generic advice like 'optimize your query.' Andromeda retrieves a specific historical case where disabling 'foreign_key_checks' solved the issue, a manual entry about disabling 'autocommit', and telemetry showing high 'innodb_log_write_requests', enabling it to recommend specific knob values like 'innodb_buffer_pool_size=2G'.
Key Novelty
Heterogeneous RAG for DBMS Debugging
  • Unifies diverse retrieval sources (textual manuals vs. problem-solution pairs in historical questions) into a shared embedding space using contrastive learning
  • Integrates telemetry analysis (time-series performance metrics) directly into the RAG context to ground diagnosis in actual system state
  • Employs a two-phase prompting strategy: first identifying relevant knobs, then reasoning about specific values based on retrieved context
Architecture
Architecture Figure Figure 3
Overview of the Andromeda framework, split into Offline and Online stages.
Evaluation Highlights
  • Significantly outperforms existing solutions on real-world DBMS configuration debugging datasets (qualitative claim from abstract, specific numbers not provided in snippet)
  • Effectively retrieves domain-specific contexts from multiple sources (historical questions, manuals, telemetries) to improve diagnosis accuracy
Breakthrough Assessment
7/10
Novel application of RAG to a highly technical, heterogeneous domain (DBMS tuning). The integration of time-series telemetry with textual retrieval is a significant architectural advance for domain-specific agents.
×