FAC: Factual Acquisition Capacity—an agent's ability to retrieve, validate, and integrate external knowledge via tools.
LRF: Logical Reasoning Fidelity—an agent's capability to maintain rigorous causal relationships and deduction chains during problem-solving.
GAIA: A benchmark for General AI Assistants that poses complex, multi-step questions requiring reasoning, tool use, and multi-modality.
BrowseComp: A benchmark specifically designed to evaluate web browsing agents.
ReAct: Reasoning + Acting—a paradigm where agents generate reasoning traces before executing actions.
MCTS: Monte Carlo Tree Search—a search algorithm used to explore possible future states to make optimal decisions.
RAG: Retrieval-Augmented Generation—enhancing model responses by retrieving relevant external documents.
Reflect: A mechanism where the agent analyzes past actions or observations to improve future performance.
CDX API: An API provided by the Wayback Machine to query historical web captures.
MCP Box: Model Context Protocol Box—a standardized way for AI models to interact with external data and tools (mentioned as concurrent work).
Test-Time Scaling: Techniques applied during inference (like sampling multiple paths or self-correction) to improve performance without retraining.