CoT: Chain-of-Thought—a technique where models generate intermediate reasoning steps before the final answer
MLLM: Multimodal Large Language Model—AI models capable of processing both text and visual inputs
Thinking Token: A special token (e.g., <Thinking_of_Reasoning>) introduced by Heima to represent an entire reasoning step in its hidden state, replacing verbose text
Latent Space: The internal vector representation of data within a neural network, as opposed to the explicit textual output
KV Cache: Key-Value Cache—a mechanism to store previous calculations in Transformers to speed up generation
Heima Encoder: The reasoning model that processes inputs and generates compact thinking tokens followed by the answer
Heima Decoder: A standard LLM trained to take the hidden state of a thinking token and reconstruct the original textual reasoning step
Progressive Encoding: A training strategy where the number of compressed CoT stages is gradually increased from 0 to K