Speculative Thinking: The proposed framework where a large model intervenes in a small model's generation at specific reasoning delimiters (e.g., newlines) to correct or guide thoughts.
Speculative Decoding: A technique to accelerate inference where a small model drafts tokens and a large model verifies them in parallel; operates at token-level.
Affirmation/Reflection Takeover: Mechanism where the large model takes over generation if the small model outputs a delimiter followed by affirmation (e.g., 'yes') or reflection (e.g., 'wait') keywords.
Verification Takeover: Mechanism where the large model takes over if keywords like 'verify' or 'double-check' appear after a delimiter.
Excessive Reflection Takeover: Mechanism that forces a handover to the large model if the small model reflects/backtracks too many times (tracked by a counter c).
Reasoning-supportive tokens: Tokens like 'wait', 'hmm', 'alternatively' that signal self-correction or internal monologue.
Deepseek-distilled Qwen-2.5: The specific family of reasoning models used in the paper (distilled from DeepSeek-R1), available in sizes like 1.5B, 7B, and 32B.