KV cache: Key-Value cache; storage of intermediate attention representations to avoid re-computation during autoregressive generation
Horizontal scanning: The standard Transformer behavior where each new token attends to the entire flat history of previous tokens
Vertical scanning: PHOTON's approach of representing context via compact coarse states and descending to token-level details only when necessary
HierGen: Hierarchical Generation; decoding where chunks are independently decoded in parallel within the same higher-level context
RecGen: Recursive Generation; a decoding schedule that updates only the coarsest stream using a summary from the decoder, avoiding bottom-up re-encoding
Chunker: A module that aggregates multiple fine-grained representations into a single coarse representation (e.g., via concatenation and projection)
Meta-context: The set of tokens generated by a single step of the top-level coarse encoder
Pareto frontier: The set of optimal trade-offs between two competing metrics (here, throughput vs. quality)
Throughput per unit memory: A metric measuring how many tokens/requests can be processed relative to the memory footprint required
Recursive consistency: A property where the bottom-up encoding of a sequence matches its top-down reconstruction, ensured via auxiliary loss