MLLM: Multi-modal Large Language Model—AI systems that can process and generate both text and images/video
Mamba: A state space model architecture that offers linear computational complexity with sequence length, unlike the quadratic complexity of Transformers
MoE: Mixture of Experts—an architecture where different subsets of parameters (experts) are activated for different inputs, increasing capacity without increasing inference cost
ICL: In-Context Learning—the ability of a model to learn from examples provided within the prompt without updating its weights
KV-Cache: Key-Value Cache—memory used during text generation to store past attention computations, which grows with sequence length in Transformers
GQA: Grouped Query Attention—an efficiency optimization for attention mechanisms that groups query heads to reduce memory bandwidth
SwiGLU: A specific activation function used in modern LLMs that combines Swish and Gated Linear Units for better performance
FLOPs: Floating Point Operations—a measure of computational cost
VNBench: A synthetic video benchmark designed to evaluate atomic capabilities like retrieval, ordering, and counting in video models
Needle-In-A-Haystack: An evaluation method where a specific piece of information (needle) is hidden in a large amount of irrelevant data (haystack) to test retrieval capabilities