TC0: Complexity class of problems solvable by constant-depth, polynomial-size circuits with unbounded fan-in (e.g., addition, multiplication).
NC1: Complexity class of problems solvable by logarithmic-depth circuits (e.g., non-solvable group word problems), considered inherently sequential.
CoT: Chain-of-Thought—generating intermediate reasoning steps before the final answer.
NoPE: No Positional Encoding—a transformer variant where position information is not explicitly added to embeddings.
State-tracking: Updating the status of entities step-by-step based on a sequence of actions.
LEGO: Learning Equality and Group Operations—a synthetic reasoning task involving variables, values, and operations.
Attention concentration: The mechanism where the attention head learns to focus almost exclusively on the single relevant token, ignoring distractors.
Distractors: Irrelevant tokens in the context that might incidentally point to the correct answer during training but confuse the model at test time.
Simply transitive action: A group action where exactly one group element maps any state y1 to y2 (e.g., cyclic group).
Symmetry group action: A group action where multiple group elements can map y1 to y2 (e.g., permutation group Sn), creating ambiguity/distractors.
Recursive self-training: A curriculum where the model is trained on its own generated reasoning traces from shorter lengths to solve longer lengths.