← Back to Paper List

Large Language Models for Mathematical Reasoning: Progresses and Challenges

Janice Ahn, Rishu Verma, Renze Lou, Dingxiao Liu, Rui Zhang, Wenpeng Yin
The Pennsylvania State University, Temple University
Conference of the European Chapter of the Association for Computational Linguistics (2024)
Reasoning Benchmark MM RL

📝 Paper Summary

Mathematical Reasoning Survey of LLMs
This survey comprehensively categorizes the landscape of LLM-based mathematical reasoning, covering problem types, datasets, prompting/fine-tuning techniques, and persisting challenges to unify disparate research efforts.
Core Problem
The rapid growth of LLM-based math reasoning research has led to a vast, varied landscape with disparate datasets and metrics, making it difficult to discern true progress or shared obstacles.
Why it matters:
  • Current research is siloed by problem type (e.g., arithmetic vs. geometry), hampering the understanding of obstacles across the broader field
  • A lack of a unified framework prevents accurate assessment of whether LLMs are truly achieving generalized mathematical capabilities or just overfitting to specific tasks
Concrete Example: Evaluating an LLM on simple arithmetic (e.g., '21 + 97') fails to capture its ability to handle spatial geometry problems (e.g., 'What is its area?') or the rigorous logic required for Automated Theorem Proving, leading to inconsistent performance claims across the literature.
Key Novelty
Four-Dimensional Survey Framework
  • Structurally organizes the field into four dimensions: (1) problem types/datasets, (2) LLM-oriented techniques (prompting vs. fine-tuning), (3) influencing factors, and (4) persisting challenges
  • Distinguishes itself from prior surveys by specifically focusing on LLMs (rather than general Deep Learning) and incorporating educational perspectives
×