LLM: Large Language Model—AI models trained on vast text data to generate human-like text
MLLM: Multi-modal Large Language Model—LLMs capable of processing and generating content across multiple modalities like text, images, and video
PEFT: Parameter-Efficient Fine-Tuning—Techniques to adapt large models by updating only a small subset of parameters
LoRA: Low-Rank Adaptation—A PEFT technique that injects trainable low-rank matrices into transformer layers while freezing the main weights
RLHF: Reinforcement Learning from Human Feedback—Fine-tuning models using reward signals derived from human preferences
SFT: Supervised Fine-Tuning—Training a model on labeled input-output pairs to follow instructions
DPO: Direct Preference Optimization—An alignment algorithm that optimizes a policy directly on preference data without an explicit reward model
GRPO: Generalized Reinforcement Policy Optimization—A reinforcement learning method used for reasoning capabilities, often requiring minimal data
Quantization: Reducing the precision of model parameters (e.g., from 16-bit to 4-bit) to save memory and speed up inference
vLLM: A high-throughput and memory-efficient inference engine for LLMs
Megatron: A framework for training massive language models using model parallelism
Pass@K: A metric measuring the probability that at least one of the top K generated code solutions is correct