RAG: Retrieval-Augmented Generation—AI systems that answer questions by searching for relevant documents before generating a response
LRM: Large Reasoning Models—LLMs capable of complex, multi-step deduction (e.g., OpenAI o1, DeepSeek-R1) via test-time scaling
CoT: Chain-of-Thought—a prompting technique where models generate intermediate reasoning steps before the final answer
Reasoning-Augmented Retrieval: Using reasoning capabilities to optimize the retrieval process (e.g., decomposing complex queries, verifying document relevance)
Retrieval-Augmented Reasoning: Using external knowledge to support and verify the model's internal deductive processes
ORM: Outcome Reward Model—evaluating the quality of the final generated answer
PRM: Process Reward Model—evaluating the quality of intermediate reasoning steps
PPO: Proximal Policy Optimization—an RL algorithm used to train models by updating policies in stable, clipped steps
MCTS: Monte Carlo Tree Search—a search algorithm used to explore reasoning paths by simulating future outcomes