Delta parameters: The difference between the parameters of a fine-tuned model and its pre-trained base model (θ_SFT - θ_PRE)
Homologous models: Models that share the same pre-trained backbone architecture and initialization (e.g., multiple Llama-2-7B models tuned on different data)
SFT: Supervised Fine-Tuning—adapting a pre-trained model to a specific task using labeled data
Task Arithmetic: A model merging technique that adds task-specific vectors (delta parameters) to a base model, often scaled by a coefficient λ
Bernoulli distribution: A probability distribution taking value 1 with probability p and 0 with probability 1-p, used here for the random drop mask
AlpacaEval: A benchmark for evaluating instruction-following capabilities of language models
GSM8K: Grade School Math 8K—a benchmark dataset of high quality grade school math word problems
MBPP: Mostly Basic Python Programming—a benchmark for evaluating code generation