CPT: Continual Pre-training—training an already pre-trained model on new data (usually domain-specific) to update its knowledge.
SFT: Supervised Fine-Tuning—training a model on input-output pairs (instructions and responses) to teach it how to follow tasks.
Catastrophic Forgetting: The tendency of a neural network to completely forget previously learned information when trained on new data.
RAG: Retrieval-Augmented Generation—systems that retrieve external documents to help an LLM answer questions.
Multi-hop reasoning: Answering questions that require connecting multiple pieces of information from different sources.
Long-tail knowledge: Facts that appear very infrequently in the training data (e.g., obscure historical events or unpopular entities).
Reversal curse: The phenomenon where an LLM trained on 'A is B' fails to answer 'What is B?' (i.e., 'B is A').
Entity extraction: Identifying specific names, places, or organizations within a text.
Paraphrasing: Rewriting the same fact in multiple different ways to increase data diversity for the model.