SLM: Small Language Model—defined here as a model fitting on a consumer device with low latency (typically <10B parameters)
LLM: Large Language Model—a generalist model significantly larger than an SLM (typically >10B parameters, often requiring data center GPUs)
Agentic System: Software with agency that uses Language Models to make decisions, control flow, and invoke tools to complete tasks
LoRA: Low-Rank Adaptation—a parameter-efficient fine-tuning technique that updates only a small subset of model weights
DoRA: Weight-Decomposed Low-Rank Adaptation—an improvement on LoRA that separates magnitude and direction updates
FLOPs: Floating Point Operations—a measure of the computational cost of running a model
Heterogeneous Agentic Systems: Systems that use multiple different models (mixing SLMs and LLMs) rather than a single monolithic model for all tasks
Mamba: A state-space model architecture that offers linear scaling inference (unlike Transformers' quadratic scaling) for higher efficiency
Self-consistency: A reasoning technique where a model generates multiple answers and selects the most frequent one to improve accuracy
Tool calling: The ability of a language model to output structured text (like JSON) to invoke external software functions