In-Context Learning (ICL): The ability of a model to perform a task by looking at a few examples (demonstrations) in the prompt without updating its weights
Supportive Pretraining Data: A specific subset of pretraining data that, if the model is trained on it, disproportionately improves performance on a target capability (here, ICL)
ORCA-ICL: The specific algorithm used in this paper to find supportive data by comparing gradients of pretraining data with gradients of ICL task data
Perturbative Continued Pretraining: Running a very small number of training steps (gradient descent) on a pre-trained model using a specific data subset to measure that subset's impact
Zero-shot Prompting: Asking the model to perform a task without providing any examples/demonstrations
Verbalizer: A mapping that converts task labels (e.g., class indices) into natural language words (e.g., 'positive', 'negative') for the language model to predict
OPT: Open Pre-trained Transformer—a suite of open-source large language models developed by Meta
Information Gain: A measure used here to quantify how much the preceding context reduces the uncertainty of the current token prediction