PEFT: Parameter-Efficient Fine-Tuning—methods to adapt large models by updating only a small subset of parameters.
PLM: Pretrained Language Model—models like BERT, T5, or LLaMA trained on vast corpora.
Adapter: Small trainable neural network modules inserted between layers of a frozen pretrained model.
Soft Prompt: Learnable continuous vectors prepended to inputs or hidden states to guide the model without changing weights.
Catastrophic Forgetting: The tendency of a neural network to completely forget previously learned information upon learning new information.
LoRA: Low-Rank Adaptation—a reparameterization method that injects trainable low-rank decomposition matrices into model layers.
FFN: Feed-Forward Network—a component of the Transformer block consisting of linear transformations and activation functions.
Prefix-tuning: A method that prepends learnable vectors (prefixes) to the keys and values in the self-attention mechanism.
Inference Efficiency: The speed and computational cost required for the model to generate predictions after training.