AIOps: Artificial Intelligence for IT Operations—applying AI to enhance IT operations like monitoring and incident resolution
MTTR: Mean Time to Resolution—the average time required to repair a failed component or device
Decision Quality (DQ): A novel metric introduced in this paper measuring validity, specificity, and correctness of LLM recommendations
SLA: Service Level Agreement—a commitment between a service provider and a client, here referring to reliability guarantees
Token overlap: A measure of text similarity based on the number of shared words/tokens between generated text and a ground truth reference
TinyLlama: A compact 1.1 billion parameter language model, used here to demonstrate that architecture matters more than model size
Ollama: A tool for running large language models locally
Quantization: Reducing the precision of model weights (e.g., to 4-bit) to reduce memory usage and increase inference speed
Docker Compose: A tool for defining and running multi-container Docker applications
T2U: Time to Usable Understanding—latency from incident onset to the production of the first actionable output