Rethinking Deep Research from the Perspective of Web Content Distribution Matching

📝 Paper Summary

Agentic RAG pipeline Web agents Self-evolving Agentic reasoning

WeDAS improves deep research agents by using a probing mechanism to estimate the alignment between queries and the web's information density, dynamically adjusting search granularity to avoid noise or sparsity.

Core Problem

Deep Search Agents suffer from a structural misalignment between reasoning-driven queries and the actual web index; coarse queries return noise while hyper-specific queries return nothing.

Why it matters:

Agents act blindly without perceiving the 'information density' of the web, leading to wasted search steps and hallucinations
Existing frameworks treat search engines as static utilities rather than dynamic environments requiring calibration
The gap between high-level reasoning plans and low-level retrieval precision causes failure in complex, open-ended research tasks

Concrete Example: When solving a GAIA task, a coarse query triggers a deluge of irrelevant noise, while a hyper-specific query results in retrieval sparsity (zero results), leading the agent down a failed trajectory because it cannot sense that its query granularity is mismatched with the available web content.

Key Novelty

Web Content Distribution Aware Search (WeDAS)

Introduces Query-Result Alignment Score (QRAS), a metric decomposing search utility into topical relevance, information density, and noise robustness
Uses a 'Content Distribution Probing' mechanism: effectively a few-shot 'sonar' that samples the query space to map information density before committing to a search path
Empowers agents to dynamically recalibrate sub-goals based on this feedback, ensuring queries land in high-utility information regions

Evaluation Highlights

+9.8% to +12.3% improvement in success rate on GAIA benchmark compared to standard Deep Search baselines
Consistent performance gains across four benchmarks (GAIA, GPQA, HotpotQA, Bamboogle), with WeDAS enhancing information gain in search trajectories
Outperforms Search-R1 and WebThinker baselines by significant margins on open-ended research tasks

Breakthrough Assessment

8/10

Addresses a fundamental 'blind spot' in agentic search—the lack of feedback on query searchability. The probing mechanism is a logical evolution from static retrieval to active environmental sensing.

⚙️ Technical Details

Problem Definition

Setting: Open-domain deep research task T requiring iterative interaction with a dynamic web environment W via a search engine f_SE

Inputs: High-level research objective x

Outputs: Final answer A synthesizing sub-results from multiple search steps

Pipeline Flow

Task Decomposition (Planner) → Sub-questions
Content Distribution Probing (WeDAS Module) → QRAS Estimation
Query Calibration/Refinement → Optimized Query
Execution & Reasoning Loop → Final Answer

System Modules

Planner

Decomposes high-level task T into atomic sub-questions S

Model or implementation: MiroThinker-v1.0-30B

Content Distribution Prober (WeDAS (Novel))

Samples potential query space and iteratively estimates alignment score via limited query accesses

Model or implementation: MiroThinker-v1.0-30B (as Meta-Evaluator)

Meta-Evaluator (WeDAS (Novel))

Computes QRAS based on retrieved snippets (relevance, density, noise) and provides feedback

Model or implementation: MiroThinker-v1.0-30B

Search Engine

Executes queries and returns ranked document lists

Model or implementation: Serper Google Search API

Novel Architectural Elements

Integration of a 'Content Distribution Probing' loop before final query execution
Meta-evaluator module explicitly scoring query-result alignment (QRAS) to guide trajectory calibration
Feedback loop that recalibrates sub-goals based on local content landscape density

Modeling

Base Model: MiroThinker-v1.0-30B (based on Llama-3)

Reproducibility

Code: https://github.com/Rethinking-Deep-Research/WeDAS

📚 Prerequisite Knowledge

Prerequisites

Understanding of agentic workflows (planning, reasoning, acting)
Familiarity with information retrieval metrics (TF-IDF, KL divergence)
Basic knowledge of Large Language Models (LLMs) as stochastic policies

Key Terms

QRAS: Query-Result Alignment Score—a metric quantifying the congruence between search queries and the prevailing information landscape based on relevance, density, and noise

WeDAS: Web Content Distribution Aware Search—the proposed framework that uses probing to map web information topography

Deep Search Agent: Autonomous agents capable of long-horizon planning and iterative retrieval to solve complex open-ended problems

TF-IDF: Term Frequency-Inverse Document Frequency—a statistical measure used to evaluate how important a word is to a document in a collection

KL divergence: Kullback-Leibler divergence—a measure of how one probability distribution differs from a second, reference probability distribution

EIG: Expected Information Gain—the expected reduction in entropy (uncertainty) about the ground truth answer provided by a search action

MiroThinker: The backbone agent model used in the experiments (based on Llama-3-30B-Instruct)

Serper: A Google Search API used as the retrieval function in the experiments

Jaccard Similarity: A statistic used for gauging the similarity and diversity of sample sets (intersection over union)

Levenshtein Similarity: A metric for measuring the difference between two sequences (edit distance)

GAIA: A benchmark for General AI Assistants that requires reasoning, tool use, and multi-modality

GPQA: A challenging dataset of graduate-level Google-proof Q&A questions

Bamboogle: A dataset of questions requiring multi-step reasoning and retrieval where the answer is not immediately obvious from a single search