SFT: Supervised Fine-Tuning—retraining a pre-trained model on labeled instruction-response pairs to align it with human intent
One-to-many nature: The linguistic property where a single intent or meaning can be validly expressed by multiple different token sequences
Logits: The raw, unnormalized output scores from the model's last layer, before being converted to probabilities via softmax
Stop-gradient: An operator in computational graphs that prevents error gradients from flowing back through a specific variable during training updates
Jacobian: A matrix of all first-order partial derivatives of a vector-valued function; here representing how model outputs change with respect to parameters
LoRA: Low-Rank Adaptation—a parameter-efficient fine-tuning method that updates only small low-rank matrices added to the original weights
Hard masking: A binary filtering technique where data points (tokens) are either fully included or fully excluded from the loss calculation based on a threshold
DFT: Dynamic Fine-Tuning—a baseline method that reweights token losses based on confidence rather than masking them entirely