invertibility: The ability to accurately reconstruct an input sequence solely from a model's output embedding or hidden states
causal training: Training a model to predict the next token in a sequence based only on previous tokens (standard GPT style)
autoencoder: A neural network architecture trained to compress an input into a latent representation and then reconstruct the original input from it
Hamming metric: A measure of accuracy defined here as the proportion of input tokens that are correctly identified in the reconstructed sequence
entropy ratio: A metric measuring the fraction of input information retained in an embedding, normalized by tokenizer size
FineWeb-edu: A large-scale dataset filtered from Common Crawl, used here for training and evaluation
FineMath: A mathematics-specific subset of the FineWeb dataset used for out-of-distribution testing
FLOP: Floating Point Operations—a measure of computational work