Agentic Search: Search systems where an LLM agent iteratively plans, executes tool calls (search/filter), and synthesizes answers rather than just retrieving documents
Underspecified queries: Queries where user preferences are vague or implicit (e.g., 'good vibe'), requiring the system to infer intent or general norms
Clarification: A hidden ground-truth note written by the query author explaining their specific intent, used by the judge to evaluate relevance but not shown to the agent
Budget Oracle: A theoretical upper bound metric that selects the best model combination to maximize accuracy under a fixed total monetary budget
Quality Oracle: A theoretical upper bound metric that selects the cheapest model capable of achieving the highest possible accuracy for each specific query
LangGraph: A library for building stateful, multi-actor applications with LLMs, used here to orchestrate the agent's plan-retrieve-filter loop
Reranker: A model that rescores the initial set of retrieved documents to improve precision before the final answer generation
BM25: A probabilistic retrieval function based on term frequency and inverse document frequency, used as a baseline sparse retriever