← Back to Paper List

Agentic Neurosymbolic Collaboration for Mathematical Discovery: A Case Study in Combinatorial Design

Hai Xia, Carla P. Gomes, Bart Selman, Stefan Szeider
arXiv (2026)
Agent Memory Reasoning

πŸ“ Paper Summary

Neurosymbolic AI AI for Mathematics
An LLM agent equipped with symbolic tools and persistent memory collaborates with a human researcher to discover a new tight lower bound on Latin square imbalance by identifying data patterns symbolic methods missed.
Core Problem
Current AI for math focuses on fixed objectives (proving conjectures, optimizing known quantities), failing at open-ended discovery where the hypothesis itself must emerge from exploration.
Why it matters:
  • Pure symbolic methods hit combinatorial walls (e.g., exhaustive search fails at n=18 for this problem)
  • Pure neural methods (LLMs) lack the rigor required for proofs and often hallucinate constructive claims
  • Understanding the distinct roles of neural intuition, symbolic verification, and human steering is critical for replicating discovery workflows
Concrete Example: When searching for 'perfect permutations' to minimize imbalance, the symbolic solver hit a timeout at n=18. The agent, analyzing numerical data from a relaxed search, noticed a parity constraint (all distances were even) that the human had missed, leading to a new theorem.
Key Novelty
Agentic Neurosymbolic Collaboration Framework
  • Integrates an LLM agent (Claude Opus 4.5) with external symbolic tools (SageMath, Rust solvers) and a structured 'progressive disclosure' persistent memory to maintain context across sessions
  • Utilizes multi-model deliberation among frontier LLMs specifically for criticism and error detection, leveraging the finding that models are better at critiquing than constructing
  • Formalizes a workflow where the agent handles hypothesis generation/coding, symbolic tools handle verification, and the human provides strategic pivots (reframing the research question)
Evaluation Highlights
  • Discovered and proved a tight lower bound of 4n(n-1)/9 for Latin square imbalance (n ≑ 1 mod 3), settling an open theoretical question
  • Constructed 'near-perfect permutations' achieving this bound for all n up to 52, exceeding the prior computational limit of n=18
  • Refuted a multi-model consensus claim that modular inversion achieves O(n^2.5) imbalance, demonstrating it scales as Θ(n^3.6) instead
Breakthrough Assessment
9/10
Demonstrates genuine mathematical discovery (new theorem + construction) settling an open problem. The detailed process analysis of human-AI interaction offers a replicable blueprint for neurosymbolic research.
×