TextbookReasoning: A new dataset of 650k reasoning questions extracted from 12k university-level textbooks with truthful reference answers
MegaScience: A composite dataset of 1.25M instances combining TextbookReasoning with filtered subsets of public datasets (NaturalReasoning, Nemotron-Science)
SFT: Supervised Fine-Tuning—training a pre-trained base model on labeled examples to follow instructions
Decontamination: The process of removing training data that overlaps with test benchmarks to prevent cheating; this paper uses embedding similarity + LLM verification
CoT: Chain of Thought—prompting models to generate intermediate reasoning steps before the final answer
DeepSeek-R1: A strong reasoning model used in this paper to generate or refine solutions for the datasets
Locality-sensitive min-hashing: A technique used for deduplicating text data by efficiently estimating the similarity between sets
Pass@1: An evaluation metric measuring the percentage of problems where the model's first generated answer is correct