CTR: Click-Through Rate—the ratio of users who click on a specific link to the number of total users who view a page, email, or advertisement
Sliding-window paradigm: A data formulation strategy where a unique training sample is created for every single interaction using a fixed-size window of preceding items as context
FLOPs: Floating Point Operations—a measure of computer performance and computational cost
Hidden-state leakage: A phenomenon where a model inadvertently accesses information from tokens outside its intended context window (e.g., future tokens or distant past) during training
Positional bias overfitting: When a model learns to rely on the specific position index of an item in the input sequence rather than its semantic content
Streaming prompt: A long prompt containing multiple prediction targets (k) sequentially, allowing the model to process them in a single forward pass
[SUM] token: A special token inserted after each target interaction to aggregate information and serve as the position for classification loss
Windowed casual attention: An attention mechanism where each token can only attend to a specific range of preceding tokens (size n), rather than all preceding tokens