โ† Back to Paper List

Agentic Reasoning: A Streamlined Framework for Enhancing LLM Reasoning with Agentic Tools

Junde Wu, Jiayuan Zhu, Yuyuan Liu, Min Xu, Yueming Jin
University of Oxford, National University of Singapore, Carnegie Mellon University, Mohamed bin Zayed University of Artificial Intelligence
arXiv (2025)
Agent Reasoning Memory RAG KG Benchmark

๐Ÿ“ Paper Summary

Agentic RAG pipeline Memory organization Tool-use post-training
Agentic Reasoning enhances LLM problem-solving by dynamically integrating web search, code execution, and a structured Mind-Map memory into the reasoning chain to handle complex, knowledge-intensive tasks.
Core Problem
Current reasoning models excel in structured domains like math/code but struggle with open-ended, knowledge-intensive tasks requiring extensive research and maintaining coherence over long reasoning chains.
Why it matters:
  • Applying math/code-style reasoning to social sciences or experiential fields often produces flawed or overly rigid results
  • Open-source models lag behind proprietary systems (like OpenAI Deep Research) in deep research capabilities due to lack of effective external tool integration
  • LLMs frequently lose track of context or hallucinate when attempting long reasoning sequences without structured memory
Concrete Example: When asked a riddle about family relationships involving a surgeon ('The surgeon... says I can't operate on this child, he's my son!'), DeepSeek-R1 fails after 17 seconds due to bias. Agentic Reasoning uses a Mind-Map to explicitly graph the entities [surgeon], [boy], and [father], correctly identifying the relationship.
Key Novelty
Agentic Reasoning Framework with Mind-Map Memory
  • Integrates three specific agents (Web-Search, Code, Mind-Map) directly into the reasoning loop via special tokens, allowing the model to pause, query tools, and reintegrate results
  • Introduces a 'Mind-Map' agent that constructs a dynamic knowledge graph from the reasoning context, allowing the model to query its own past thoughts and maintain coherence over long chains
  • Optimizes the Web-Search agent by combining query breakdown, reranking, and Mind-Map context, finding this superior to standard RAG or knowledge refinement alone
Architecture
Architecture Figure Figure 1
The Agentic Reasoning workflow where the LLM halts generation to invoke external tools (Search, Code, Mind-Map) and reintegrates results.
Evaluation Highlights
  • Achieves 23.8% accuracy on Humanity's Last Exam, a 14.4% improvement over the raw model and narrowing the gap with OpenAI Deep Research to 2.8%
  • Surpasses o3-mini-high on GPQA Diamond benchmark with 66.8% accuracy (vs 64.1%)
  • Establishes a new state-of-the-art on GAIA benchmark among public methods, outperforming OpenAI Deep Research on Level 1 and Level 2 tasks
Breakthrough Assessment
9/10
Significantly narrows the gap between open-source and proprietary 'Deep Research' models. The Mind-Map concept for maintaining reasoning coherence is a strong architectural contribution.
×