CoT2: Chain of Thought with Continuous Tokens—a method where reasoning steps are dense vectors (weighted sums of embeddings) rather than single discrete tokens
MNNS: Minimum Non-Negative Sum—a task requiring assigning signs to numbers to minimize their non-negative sum, used as a proxy for search/planning capabilities
CSFT: Continuous Supervised Fine-Tuning—training the model to output a target probability distribution over tokens rather than a single ground-truth token
MTS: Multi-Token Sampling—an inference strategy that samples K discrete tokens and averages their embeddings to form a continuous token, controlling parallelism
superposition: Representing multiple distinct states simultaneously by taking a weighted sum of their embedding vectors
ProntoQA: A logical reasoning benchmark testing whether a target node is reachable from a start node in a graph via deductive steps
ProsQA: A logical reasoning benchmark similar to ProntoQA but asking which of two target nodes is reachable
GRPO: Group Relative Policy Optimization—an RL algorithm that optimizes policies based on relative performance of a group of outputs
COCONUT: A prior method (cited baseline) that replaces discrete tokens with the last hidden state of the LLM in a curriculum learning fashion
embedding dimension: The size (d) of the vector space used to represent tokens; determines the capacity for packing information in CoT2