LoRA: Low-Rank Adaptation—a technique to fine-tune large models by training small rank-decomposition matrices while keeping the main weights frozen
SLM: Small Language Model—compact AI models designed for efficiency, often deployable on consumer hardware
GQA: Group Query Attention—an attention mechanism that groups query heads to share key/value heads, reducing memory usage (KV cache) during generation
RoPE: Rotary Positional Embeddings—a method to encode token position information into the attention mechanism using rotation matrices
SigLIP: Sigmoid Loss for Language Image Pre-training—a vision encoder model used to extract features from images
Conformer: A model architecture combining Convolutional Neural Networks and Transformers, commonly used for audio/speech processing
CoT: Chain-of-Thought—a prompting or training technique where the model generates intermediate reasoning steps before the final answer
DPO: Direct Preference Optimization—a method to align models with human preferences by optimizing directly on ranked outputs without a separate reward model
SFT: Supervised Fine-Tuning—training a model on labeled examples (instruction-response pairs) to teach it how to follow instructions
log-Mel filter-bank: A standard way to represent audio as a visual-like spectrogram, adjusted to match human hearing perception
KV cache: Key-Value cache—memory used to store attention computations for previous tokens to speed up text generation