LRM: Large Reasoning Model—an LLM trained to generate internal reasoning chains (thoughts) before producing a final answer
Rumination: The tendency of the model to redundantly re-verify previously explored problem formulations or assumptions without making new progress
Bloom Cycle: The initial phase of reasoning where the model decomposes the problem and generates a preliminary interim solution
Reconstruction Cycle: Subsequent reasoning phases where the model reconsiders its initial assumptions or solution, triggered by tokens like 'Wait' or 'Alternatively'
Thoughtology: The systematic study of the internal reasoning behaviors, patterns, and limitations of Large Reasoning Models
SFT: Supervised Fine-Tuning—training a model on labeled examples (inputs and target outputs) to teach it specific behaviors or formats
GRPO: Group Relative Policy Optimization—a reinforcement learning algorithm used to train DeepSeek-R1, likely optimizing for reasoning correctness
CoT: Chain-of-Thought—a prompting technique or model capability where the system generates intermediate reasoning steps before the final answer
Inference-time scaling: The concept that allowing a model to use more computation (generate more tokens) during test time leads to better performance