← Back to Paper List

Reasoning Core: A Scalable Procedural Data Generation Suite for Symbolic Pre-training and Post-Training

Valentin Lacombe, Valentin Quesnel, Damien Sileo
Not explicitly reported in the paper
arXiv (2026)
Reasoning Pretraining RL Benchmark

📝 Paper Summary

Synthetic Data Generation Symbolic Reasoning Pre-training Reinforcement Learning
Reasoning Core provides a scalable suite of procedurally generated, solver-verified symbolic tasks (like planning and logic) that improve language model reasoning when mixed into pre-training data.
Core Problem
Existing procedural data generators rely on narrow templates or fixed puzzles (e.g., just BlocksWorld), lacking the distributional breadth needed to instill general reasoning primitives during pre-training.
Why it matters:
  • Training on narrow distributions (e.g., single planning domains) fails to generalize to minor variations
  • Scaling reasoning capabilities requires verifiable data beyond web text, but prolonged RL is compute-intensive
  • Current suites like Reasoning Gym prioritize task count over the distributional generality required for effective pre-training
Concrete Example: Training on a single PDDL domain like BlocksWorld does not generalize to other planning problems. Reasoning Core instead samples randomized PDDL domains covering the full class of STRIPS problems.
Key Novelty
High-Generality Procedural Symbolic Suite
  • Generates data for foundational formal domains (planning, logic, equations) using randomized parameters rather than fixed templates to ensure broad distributional coverage
  • Integrates external solvers (theorem provers, planning engines) to provide rigorous verification and reward signals for every generated instance
  • Uses a continuous 'difficulty knob' to scale problem complexity (e.g., proof depth, plan length) for curriculum learning
Evaluation Highlights
  • Mixing Reasoning Core data (r=0.5 ratio) into pre-training consistently improves PlatinumBench reasoning performance across three different base corpora (FineWeb, Dolci, SYNTH)
  • Symbolic data integration preserves or slightly improves validation loss on general natural language modeling, avoiding the 'tax' often paid for reasoning specialization
  • Zero-shot evaluation confirms tasks remain challenging for frontier models like GPT-5, particularly at higher difficulty settings (knob level 5)
Breakthrough Assessment
8/10
Strong contribution to synthetic data infrastructure. Moves beyond templated puzzles to solver-verified, high-generality domains essential for scaling reasoning. The demonstration of pre-training gains without language degradation is significant.
×