CoT: Chain-of-Thought—a prompting technique where the model generates intermediate reasoning steps before the final answer
BPE: Byte Pair Encoding—a tokenization method that breaks text into subword units, often obscuring the vertical alignment of characters in ASCII grids
pass@1: A metric measuring the percentage of problems solved correctly on the first attempt
premature commitment: An error mode where the model goes down a wrong solution path early in the reasoning process without sufficient evidence
constraint forgetting: An error mode where the model proposes moves that violate explicit puzzle rules (e.g., crossing lines)
repeated reasoning: A behavioral pattern where the model retries the same reasoning path without variation; found to be a benign symptom of search rather than a cause of failure
global invariants: Properties that must hold true for the entire system simultaneously, such as 'all nodes must be connected' or 'the loop must be closed'
LLM-as-a-judge: Using a strong language model to annotate or evaluate the outputs of another model (used here to classify error types in reasoning traces)