TAPE: Contextualized Equivariant Positional Encoding—the proposed framework that updates positional embeddings layer-wise based on context while preserving geometric symmetries
RoPE: Rotary Positional Embedding—a method encoding position by rotating query/key vectors, used here as the initialization for TAPE
Equivariance: A property where transforming the input (e.g., permuting tokens) results in an equivalent transformation of the output, ensuring structural stability
NC1: A complexity class of problems solvable by parallel circuits of logarithmic depth; TAPE is proven to represent algorithms in this class
O(R)-invariance: Invariance to orthogonal transformations (rotations/reflections) in the R-dimensional subspace, ensuring attention depends only on relative distances
Flash Attention: An I/O-aware exact attention algorithm that speeds up training and reduces memory usage
PEFT: Parameter-Efficient Fine-Tuning—adapting a pre-trained model by updating only a small subset of parameters
SCROLLS: A benchmark for evaluating long-context natural language understanding tasks
Perplexity: A measurement of how well a probability model predicts a sample; lower values indicate better performance