Systematic Collation: The ability to visit disparate sources and aggregate fragmented information into a single master list.
Entity Resolution: Identifying when two retrieved entities are identical despite having different names or surface forms (de-duplication).
Stopping Criteria: The decision-making process where an agent determines it has found all possible answers and ends the search.
Hedging: A failure mode where an agent provides multiple candidate answers (e.g., 'Brazil and Italy') instead of committing to the single correct one.
F1 Score: The harmonic mean of Precision and Recall, used here to measure the quality of the retrieved answer set against the ground truth.
Deep Research Agent: An autonomous agent designed to execute complex search plans, manage memory, and perform multi-step reasoning over long horizons.
Comprehensiveness Gap: The disparity between an agent's ability to retrieve a single fact versus generating an exhaustive list of all relevant items.
Fully Correct: A metric category where the agent's submitted set is semantically identical to the ground truth (Recall=1.0, Precision=1.0).
Fully Incorrect: A metric category where the intersection between the submitted answer set and the ground truth is empty.