QRAS: Query-Result Alignment Score—a metric quantifying the congruence between search queries and the prevailing information landscape based on relevance, density, and noise
WeDAS: Web Content Distribution Aware Search—the proposed framework that uses probing to map web information topography
Deep Search Agent: Autonomous agents capable of long-horizon planning and iterative retrieval to solve complex open-ended problems
TF-IDF: Term Frequency-Inverse Document Frequency—a statistical measure used to evaluate how important a word is to a document in a collection
KL divergence: Kullback-Leibler divergence—a measure of how one probability distribution differs from a second, reference probability distribution
EIG: Expected Information Gain—the expected reduction in entropy (uncertainty) about the ground truth answer provided by a search action
MiroThinker: The backbone agent model used in the experiments (based on Llama-3-30B-Instruct)
Serper: A Google Search API used as the retrieval function in the experiments
Jaccard Similarity: A statistic used for gauging the similarity and diversity of sample sets (intersection over union)
Levenshtein Similarity: A metric for measuring the difference between two sequences (edit distance)
GAIA: A benchmark for General AI Assistants that requires reasoning, tool use, and multi-modality
GPQA: A challenging dataset of graduate-level Google-proof Q&A questions
Bamboogle: A dataset of questions requiring multi-step reasoning and retrieval where the answer is not immediately obvious from a single search