← Back to Paper List

Search-o1: Agentical search-enhanced Large Reasoning Models

(Renmin) Xiaoxi Li, Guanting Dong, Jiajie Jin, Yuyao Zhang, Yujia Zhou, Yutao Zhu, Peitian Zhang, Zhicheng Dou
Renmin University of China, Tsinghua University
arXiv, 1/2025 (2025)
RAG Agent Reasoning

📝 Paper Summary

Agentic RAG pipeline Large Reasoning Models (LRMs)
Search-o1 enhances large reasoning models by integrating an agentic retrieval mechanism that autonomously searches for external knowledge when needed and refines documents into concise reasoning steps.
Core Problem
Large Reasoning Models (LRMs) like o1 suffer from knowledge insufficiency during long reasoning chains, leading to hallucinations or errors when internal knowledge is lacking.
Why it matters:
  • Extended reasoning chains can cause 'overthinking' and propagate errors from a single knowledge gap throughout the entire logical flow
  • Standard RAG retrieves once before reasoning, which fails to address diverse knowledge needs that arise dynamically during multi-step problem solving
Concrete Example: When asking for the carbon atom count in a reaction product, a standard model might guess the structure of an intermediate like 'trans-Cinnamaldehyde' if unknown. Search-o1 detects this gap, pauses, searches for the specific structure, and integrates the fact to continue reasoning correctly.
Key Novelty
Agentic Search-Enhanced LRM Framework
  • Integrates an agentic search workflow directly into the chain-of-thought, allowing the model to autonomously pause and retrieve information on demand
  • Introduces a 'Reason-in-Documents' module that summarizes retrieved content into concise reasoning steps before insertion, preventing long documents from disrupting the chain-of-thought
Evaluation Highlights
  • Reduces uncertainty (measured by terms like 'perhaps') from >30 occurrences to near zero in complex reasoning tasks compared to vanilla LRMs
  • Outperforms standard RAG and direct reasoning baselines across 5 complex reasoning domains (science, math, coding) and 6 open-domain QA benchmarks
  • Achieves superior performance by iteratively retrieving and refining knowledge only when necessary, preserving the coherence of the reasoning chain
Breakthrough Assessment
8/10
Significant step in making o1-style reasoning robust to knowledge gaps. The combination of agentic retrieval with a dedicated refinement step to maintain reasoning flow is a strong architectural contribution.
×