SciInstruct: The dataset proposed in this paper, containing physics, chemistry, math, and formal proof instructions
SciGLM: The model resulting from fine-tuning ChatGLM3 on SciInstruct
CoT: Chain-of-Thought—a prompting technique where the model generates intermediate reasoning steps before the final answer
Self-Reflective Annotation: A data generation process where an LLM generates a solution, checks it against the ground truth, and if wrong, critiques and revises its own reasoning
Lean: A functional programming language and theorem prover used for writing formal mathematical proofs
ORM: Outcome Reward Model—a method of evaluating model outputs based on the correctness of the final result rather than the steps
OCR: Optical Character Recognition—converting images of text (like textbook problems) into machine-encoded text
Pass@K: An evaluation metric measuring the probability that at least one correct solution is generated out of K attempts