SFT: Supervised Fine-Tuning—the process of training a pre-trained model on labeled instruction-response pairs to improve instruction following
Foundation Model: A large-scale pre-trained model (like Llama-2) that has not yet undergone instruction tuning
Prior Tokens: Initial tokens of the target sequence manually appended to the input prompt to guide the model's generation process
Silent Majority: Tokens that have relatively high probabilities in the foundation model's distribution but are not the absolute top choice (argmax), often containing the correct task-specific output
KL Divergence: Kullback-Leibler Divergence—a statistical metric used here to measure how different the probability distributions of the foundation model and SFT model are
LoRA: Low-Rank Adaptation—a parameter-efficient fine-tuning technique that freezes pre-trained weights and trains small rank-decomposition matrices
Pretty: Prefix Text as a Yarn—the authors' proposed method of using prior tokens to elicit alignment without training
POS tagging: Part-of-Speech tagging—the task of assigning grammatical categories (like noun, verb) to words in a text