RAG: Retrieval-Augmented Generation—systems that enhance LLM responses by fetching external data
Agentic Deep Research: A paradigm where AI agents autonomously plan, execute, and refine multi-step research tasks using reasoning and iterative search
Test-Time Scaling (TTS): Allocating more computational resources during inference (e.g., generating more reasoning steps) to improve performance
CoT: Chain-of-Thought—prompting LLMs to generate intermediate reasoning steps
RLHF: Reinforcement Learning from Human Feedback—training models to align with human preferences
ReAct: Reasoning + Acting—a prompting method where LLMs interleave reasoning traces with action execution
SFT: Supervised Fine-Tuning—training a model on labeled examples
BrowseComp: A benchmark evaluating an agent's ability to conduct multi-step open-ended web searches
HLE: Humanity's Last Exam—a benchmark with expert-level questions across diverse domains requiring deep synthesis
RL: Reinforcement Learning—a training method where agents learn optimal behaviors through trial and error and reward signals
Hallucination: When an LLM generates plausible but factually incorrect information
Multi-hop reasoning: Solving problems that require connecting pieces of information from multiple distinct sources or steps
DeepSeek-R1: A reasoning model that uses reinforcement learning to optimize reasoning chains