Full fine-tuning: Updating all parameters of a pre-trained model (encoder + head) during training
Head tuning: Freezing the pre-trained encoder and updating only the final linear classification layer (also called linear probing)
Leave-one-out stability: A measure of algorithmic stability defined by the difference in model predictions when one training sample is removed from the dataset
Lipschitz constant: A value limiting how fast a function can change; a lower constant implies the loss function is smoother and less sensitive to small input changes
Taylor expansion: Approximating a complex function (like a neural network loss) using an infinite sum of terms calculated from the values of its derivatives at a single point
Max-margin classifier: A classifier (like SVM) that maximizes the distance (margin) between the decision boundary and the nearest data points of any class
SURT: Self Unsupervised Re-Training—a proposed method to re-train the model with masked language modeling on the target data to reduce the distance between initial and final weights
MMR: Maximal Margin Regularizer—a proposed method to maximize the distance between encoded features of different classes
MHLoss: Multi-Head Loss—a proposed method using multiple linear heads simultaneously to accelerate convergence