Recurrence: A computational process where the current state h_t is derived specifically from the previous state h_{t-1} via a function g, preserving full computational history.
Autoregression: A process where the current output is inferred from previous observed outputs (tokens) o_{t-1}, which may contain only partial information compared to the hidden state.
Chain of Thought: A prompting strategy that encourages the model to generate intermediate reasoning steps, which this paper argues simulates recurrent memory.
Depth Complexity: The number of sequential steps required to process an input; Transformers have O(1) depth due to parallelism, while RNNs have O(n) depth.
Finite Automata: A theoretical model of computation (state machine) defined by states and transitions, used here as a baseline for recurrent capability.
RWKV: Receptance Weighted Key Valueโa specific RNN-like Transformer architecture mentioned as a subject of analysis.
Linear Transformer: A Transformer variant with linear attention complexity, analyzed for its recurrent capabilities.