Masked Language Modeling: A pre-training task where random tokens in a sequence are hidden, and the model must predict them based on the surrounding context
Causal Language Modeling: A pre-training task where the model predicts the next token in a sequence based only on preceding tokens (auto-regressive)
Permuted Language Modeling: A variation where the prediction order of tokens is shuffled, allowing the model to use bidirectional context without masking symbols
Denoising Autoencoder: A model trained to reconstruct an original input from a corrupted version (e.g., with deleted or masked tokens)
In-context Learning: The ability of a model to perform a task by observing examples (demonstrations) within the input prompt, without parameter updates
NSP: Next Sentence Prediction—a pre-training task used in BERT where the model predicts if one sentence immediately follows another
Zero-shot Learning: Performing a task without seeing any specific training examples, relying only on task instructions
Fine-tuning: Updating the parameters of a pre-trained model on a specific downstream dataset to improve performance on that task
Prompting: Providing input text (instructions or examples) to a frozen model to guide it toward a specific output