← Back to Paper List

Deep Research: A Systematic Survey

Z Shi, Y Chen, H Li, W Sun, S Ni, Y Lyu, RZ Fan, B Jin…
Shandong University, Renmin University of China, Leiden University, Tsinghua University
arXiv, 11/2025 (2025)
Agent RAG Memory Reasoning RL

📝 Paper Summary

Agentic RAG pipeline Deep Research Agentic Information Seeking Full-stack AI Scientist
This survey formalizes Deep Research as an evolving paradigm where LLMs act as autonomous agents that plan queries, acquire evidence, manage memory, and synthesize comprehensive reports for open-ended tasks.
Core Problem
Standard RAG and single-shot prompting fail on open-ended tasks requiring critical thinking, multi-step verification, and long-horizon reasoning, as they lack autonomous workflows to decompose problems and manage extensive context.
Why it matters:
  • Real-world research tasks demand verifiable, self-contained outputs based on multi-source evidence, which simple retrieval augmentation cannot provide.
  • Existing surveys focus on static RAG or general web agents, missing the specific technical landscape of end-to-end research systems that synthesize long-form grounded reports.
  • Current LLMs suffer from hallucination and context loss when attempting complex, multi-step investigations without structured planning and memory management.
Concrete Example: When asked a complex question requiring cross-referencing multiple sources (e.g., a competitive market analysis), a standard RAG system might retrieve fragmented facts and hallucinate connections. A Deep Research system iteratively decomposes the query, browses live web pages, filters noise, and synthesizes a structured report with citations.
Key Novelty
Three-Stage Deep Research Roadmap & Component Taxonomy
  • Formalizes a three-phase evolution: from 'Agentic Search' (finding facts) to 'Integrated Research' (synthesizing reports) to 'Full-stack AI Scientist' (hypothesis generation and discovery).
  • Deconstructs the research workflow into four distinct, interactive components: Query Planning (decomposition), Information Acquisition (retrieval/tools), Memory Management (context maintenance), and Answer Generation (synthesis).
Architecture
Architecture Figure Figure 1
An overview of the four key components in a general Deep Research system and their interaction loop.
Evaluation Highlights
  • Provides a structured taxonomy of 100+ representative systems and datasets across diverse research tasks (e.g., AutoSurvey, Search-R1, TheAIScientist).
  • Categorizes evaluation into four domains: Agentic Information Seeking, Comprehensive Report Generation, AI for Research (idea/experiment generation), and Software Engineering.
  • Identifies key optimization techniques: Workflow Prompting (e.g., DeepResearch), Supervised Fine-Tuning (e.g., WebThinker), and End-to-End Reinforcement Learning (e.g., Search-R1).
Breakthrough Assessment
9/10
This is a foundational survey that defines and structures the emerging field of Deep Research, providing clear taxonomies and a roadmap that distinguishes it from standard RAG.
×