Proof-Pile-2: A 55B-token dataset created by the authors, consisting of scientific papers, OpenWebMath, and AlgebraicStack
AlgebraicStack: A dataset of 11B tokens of source code specifically related to mathematics, spanning numerical, symbolic, and formal math
Code Llama: A family of large language models for code, based on Llama 2, used as the initialization for Llemma
Minerva: A proprietary Google model initialized from PaLM and trained on technical content; a key baseline
RoPE: Rotary Positional Embeddings—a method for encoding position information in transformers
Flash Attention 2: An algorithm that speeds up attention computation and reduces memory usage
chain of thought: A prompting technique where the model generates intermediate reasoning steps before the final answer
majority voting: An ensemble technique where multiple reasoning paths are generated and the most consistent answer is selected
SymPy: A Python library for symbolic mathematics, used here for verifying mathematical equivalence of answers
miniF2F: A benchmark for formal theorem proving containing olympiad-level math problems
formal-to-formal proving: Generating formal proof steps (tactics) given a formal statement and proof state
informal-to-formal proving: Generating a formal proof (autoformalization) given an informal natural language statement and proof
OpenWebMath: A 15B-token dataset of high-quality web pages filtered for mathematical content