AUI: Artificial Useful Intelligence—systems effective at specific real-world tasks but lacking adaptive reasoning
AGI: Artificial General Intelligence—systems capable of robust, adaptive reasoning across diverse domains
SPT: Supervised Pretraining—task-agnostic pretraining using next-token prediction loss on passive data
RPT: Reward-based Pretraining—training from scratch using reinforcement learning to maximize a reward signal
RFT: Reward-based Finetuning—applying RL after supervised pretraining
Passive Data: Data resulting from human reasoning (e.g., Internet text) that contains the 'what' but not the 'why' (reasoning traces)
Spurious Correlations: Superficial statistical patterns between tokens that models exploit to predict answers without understanding the underlying logic
Brainf**k: An esoteric programming language with a minimal command set (8 commands) and a tape-based memory model
Befunge: A two-dimensional stack-based esoteric programming language where code execution follows paths on a grid
KL penalty: A regularization term used in RL to keep the model's policy close to a reference policy (usually the pretrained model)