← Back to Paper List

Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving

Luoxin Chen, Jinming Gu, Liankai Huang, Wenhao Huang, Zhicheng Jiang, Allan Jie, Xiaoran Jin, Xing Jin, Chenggang Li, Kaijing Ma, Cheng Ren, Jiawei Shen, Wenlei Shi, Tong Sun, He Sun, Jiahui Wang, Siran Wang, Zhihong Wang, Chenrui Wei, Shufa Wei, Yong-Xu Wu, Yuchen Wu, Yihang Xia, Hua Xin, Fan Yang, Huaiyuan Ying, Hongyi Yuan, Zheng Yuan, Tianyang Zhan, Chi Zhang, et al.
ByteDance
arXiv.org (2025)
Reasoning RL Benchmark

πŸ“ Paper Summary

Automated Theorem Proving (ATP) Formal Mathematics (Lean 4) Neuro-symbolic Geometry Solving
Seed-Prover combines a lemma-style whole-proof LLM with iterative refinement and a specialized geometry engine to solve IMO-level math problems in Lean 4.
Core Problem
LLMs struggle with formal theorem proving due to the lack of intermediate supervision and the difficulty of verifying natural language proofs.
Why it matters:
  • Natural language proofs are hard to verify automatically, hindering effective reinforcement learning
  • Step-level provers (generating line-by-line) often lack high-level reasoning capabilities needed for complex proofs
  • Existing whole-proof models fail to effectively utilize compiler feedback or explore broad problem properties
Concrete Example: For a challenging functional equation problem, a standard prover might try to solve it directly and fail. Seed-Prover proposes diverse conjectures (e.g., 'f is injective') and proves them as lemmas first, building a library of facts to solve the main problem.
Key Novelty
Lemma-Style Proving & Test-Time Scaling Strategies
  • Prioritizes generating independent, reusable 'lemmas' before the main theorem, allowing modular verification and progress tracking unlike monolithic whole-proof generation
  • Uses a 'Proposer' module to broadcast many conjectures (broad reasoning) about the problem, proving them to populate a lemma pool for the main proof
  • Implements a specialized 'Seed-Geometry' engine that combines a neural policy for auxiliary constructions with a fast C++ symbolic forward-chaining engine
Evaluation Highlights
  • Proved 5 out of 6 problems in IMO 2025 (post-contest time window), including the geometry problem in under 2 seconds
  • Achieved 78.1% success rate on 155 past IMO problems (2000-2024), establishing a new state-of-the-art
  • Saturates the MiniF2F-test benchmark with 99.6% accuracy using medium inference settings
Breakthrough Assessment
9/10
Achieves near-perfect scores on standard benchmarks (MiniF2F) and solves 5/6 IMO 2025 problems, demonstrating human-silver-medal level capability in formal math.
×