Lean 4: A functional programming language and interactive theorem prover used to formalize mathematics
Autoformalization: The process of using LLMs to translate informal natural language math problems into formal code (e.g., Lean)
Expert Iteration: A training method where a model generates solutions, verifies them (e.g., via compiler), and retrains on the correct solutions to improve itself
SFT: Supervised Fine-Tuning—training a model on a labeled dataset of correct inputs and outputs
Pass@k: A metric measuring the percentage of problems solved given k attempts (samples) per problem
Whole-proof generation: Generating the entire proof script in one go, rather than interacting with the proof assistant step-by-step
DPO: Direct Preference Optimization—an algorithm for aligning models to preferences without a separate reward model
RL: Reinforcement Learning—training agents to take actions that maximize a cumulative reward