Log-likelihood default: Evaluating a model by calculating the probability of pre-defined options (e.g., A, B, C, D) given the context
Generation until: Evaluating a model by letting it autoregressively generate text until a stop token, then checking the answer
Layer Pruning: Removing an entire Transformer block from the network and connecting the previous block directly to the next
KV Retrieval: A task where the model must retrieve a specific value associated with a key from its context or memory
CoT: Chain-of-Thought—a prompting strategy where the model generates intermediate reasoning steps before the final answer
Distillation: Training a smaller student model to mimic the behavior or output distribution of a larger teacher model
Delta Model: Analyzing the difference in weights or representations between a base model and its fine-tuned or distilled version
GQA: Grouped Query Attention—an efficiency technique where multiple query heads share a single key-value head