OEDR: Open-Ended Deep Research—complex challenges where agents synthesize vast web-scale information into reports.
Planner: An agent role responsible for iteratively searching the web and optimizing the report outline based on findings.
Writer: An agent role responsible for retrieving specific evidence for each outline section and generating text.
RACE: A metric suite for Report Quality assessing Comprehensiveness, Insight, Instruction-Following, and Readability.
FACT: A metric suite for Web Retrieval assessing Citation Accuracy and Average Effective Citations.
Lost-in-the-middle: A phenomenon where LLMs fail to retrieve or use information located in the middle of a long input context.
SFT: Supervised Fine-Tuning—training a model on a labeled dataset to improve specific capabilities.
Contextual Bleeding: When information from one section of a text generation task incorrectly influences or merges with the synthesis of another section.