Problem Definition
Setting: Latent reasoning generation
Inputs: Input query x
Outputs: Final answer y, produced after processing a sequence of intermediate latent reasoning steps c = (c1, c2, ..., cT) where c_t is a high-dimensional vector/tensor rather than a token
Pipeline Flow
- Representation Initialization: Create latent thought vectors
- Model Optimization: Train via SFT or RL
- Inference Exploration: Scale sequentially or in parallel
System Modules
Representation Initializer
Establish the initial format of the latent thought
Model or implementation: Varied (Self-contained or Auxiliary)
Latent Reasoner
Process the reasoning chain in latent space without decoding to text
Model or implementation: Base LLM (Transformer)
Answer Generator
Produce the final human-readable answer
Model or implementation: Base LLM Head
Novel Architectural Elements
- De-linguistified reasoning pathway: decoupling the computational process of reasoning from the explicit generation of language tokens
- Recursive hidden state injection: feeding the output hidden state of step t directly as the input embedding for step t+1 (Coconut, System-1.5)
- Parallel reasoning trajectories: exploring multiple latent paths simultaneously in vector space before collapsing to a final answer
Modeling
Base Model: Survey covers multiple architectures (e.g., Llama, Qwen, Pythia)
Training Method: Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL)
Objective Functions:
-
Purpose: Indirect supervision via final answer correctness.
Formally: Standard language modeling loss on the final answer y only.
-
Purpose: Direct supervision via reconstruction.
Formally: Auxiliary decoder trains latent vector c_t to reconstruct explicit text step t.
-
Purpose: Alignment via knowledge distillation.
Formally: MSE loss forcing student latent states to match teacher (explicit CoT) hidden states.
-
Purpose: RL optimization for reasoning strategies.
Formally: Maximize reward R (correctness) using algorithms like REINFORCE or GRPO, often with KL penalty.
Comparison to Prior Work
- vs. Explicit CoT: Latent CoT avoids generating intermediate tokens, reducing latency and bypassing vocabulary constraints
- vs. Implicit Reasoning [Standard LLM]: Latent CoT explicitly allocates computational steps (horizontal or vertical) for reasoning, rather than relying solely on depth-wise propagation in a single forward pass
Limitations
- Interpretability: Latent thoughts are opaque vectors, making it difficult to understand or debug the model's reasoning logic compared to explicit text
- Supervision: Lack of ground-truth 'latent thoughts' makes training unstable; relying solely on outcome supervision can lead to reward hacking
- Evaluation Gap: Hard to verify if the model is genuinely reasoning or just exploiting correlations, as the internal process is unobservable
Reproducibility
This is a survey paper. The authors provide a GitHub repository (https://github.com/EIT-NLP/Awesome-Latent-CoT) tracking the relevant papers and codebases discussed in the taxonomy.