← Back to Paper List

Rethinking Deep Research from the Perspective of Web Content Distribution Matching

Zixuan Yu, Zhenheng Tang, Tongliang Liu, Chengqi Zhang, Xiaowen Chu, Bo Han
Hong Kong Baptist University, University of Sydney, University of Technology Sydney
arXiv (2026)
Agent RAG Reasoning

📝 Paper Summary

Agentic RAG pipeline Web agents Self-evolving Agentic reasoning
WeDAS improves deep research agents by using a probing mechanism to estimate the alignment between queries and the web's information density, dynamically adjusting search granularity to avoid noise or sparsity.
Core Problem
Deep Search Agents suffer from a structural misalignment between reasoning-driven queries and the actual web index; coarse queries return noise while hyper-specific queries return nothing.
Why it matters:
  • Agents act blindly without perceiving the 'information density' of the web, leading to wasted search steps and hallucinations
  • Existing frameworks treat search engines as static utilities rather than dynamic environments requiring calibration
  • The gap between high-level reasoning plans and low-level retrieval precision causes failure in complex, open-ended research tasks
Concrete Example: When solving a GAIA task, a coarse query triggers a deluge of irrelevant noise, while a hyper-specific query results in retrieval sparsity (zero results), leading the agent down a failed trajectory because it cannot sense that its query granularity is mismatched with the available web content.
Key Novelty
Web Content Distribution Aware Search (WeDAS)
  • Introduces Query-Result Alignment Score (QRAS), a metric decomposing search utility into topical relevance, information density, and noise robustness
  • Uses a 'Content Distribution Probing' mechanism: effectively a few-shot 'sonar' that samples the query space to map information density before committing to a search path
  • Empowers agents to dynamically recalibrate sub-goals based on this feedback, ensuring queries land in high-utility information regions
Evaluation Highlights
  • +9.8% to +12.3% improvement in success rate on GAIA benchmark compared to standard Deep Search baselines
  • Consistent performance gains across four benchmarks (GAIA, GPQA, HotpotQA, Bamboogle), with WeDAS enhancing information gain in search trajectories
  • Outperforms Search-R1 and WebThinker baselines by significant margins on open-ended research tasks
Breakthrough Assessment
8/10
Addresses a fundamental 'blind spot' in agentic search—the lack of feedback on query searchability. The probing mechanism is a logical evolution from static retrieval to active environmental sensing.
×