BPE: Byte-Pair Encoding—a tokenization algorithm that iteratively merges frequent character pairs to form subwords
WordPiece: A tokenization algorithm similar to BPE but based on likelihood maximization, commonly used in BERT
probing classifier: A simple model (usually a linear layer or MLP) trained on top of a frozen model's representations to test if specific information is encoded
FineWeb: A high-quality web text dataset used for pre-training
nanoGPT: A small-scale implementation of the GPT architecture, used here for efficient experimentation
WordSub: A data transformation where every vocabulary token is replaced by a consistent random string, preserving syntax but destroying form-meaning correlations
CharPert: A data transformation where every character is randomly replaced, destroying orthographic regularities while keeping token length
controlled tokenizer: A custom tokenizer with manually defined merge rules designed to test specific hypotheses about character adjacency statistics
orthographic constraints: Rules governing allowed character sequences in a language (e.g., 'ck' rarely starts a word)
MLP: Multilayer Perceptron—a basic feedforward neural network used here as the probe