Large Reasoning Models (LRMs): LLMs explicitly trained (often via RL) to generate long internal 'thinking' traces before outputting a final answer
thinking tokens: Tokens generated during the model's internal reasoning process (CoT) that are not part of the final user-visible answer
inference token compute: The total computational budget allocated to a model during generation, proportional to the number of tokens generated
Chain-of-Thought (CoT): A prompting technique where models generate intermediate reasoning steps to improve problem-solving
pass@k: A metric measuring the probability that at least one correct solution is generated out of k independent attempts
planning tasks: Problems requiring a sequence of interdependent actions to reach a goal state, often requiring lookahead
constraint satisfaction: Problems where the solution must satisfy a set of strict rules or constraints (e.g., River Crossing rules)
overthinking phenomenon: A behavior where models produce verbose, redundant reasoning traces even for simple problems or after finding the solution
data contamination: When test data (or very similar examples) is included in the model's training set, inflating performance metrics