LLaTTE: LLM-Style Latent Transformers for Temporal Events—the paper's proposed transformer architecture designed for efficiency and scaling
MLA: Multi-head Latent Attention—a memory-efficient attention mechanism (from DeepSeek) that compresses Key-Value heads into a latent vector
NE: Normalized Entropy—a standard metric for ads ranking (average log loss normalized by the entropy of the background click rate)
DHEN: Deep Heterogeneous Ensemble Network—a non-sequence backbone architecture used for processing static sparse and dense features
Upstream Model: A large, asynchronous model that processes long user histories to generate cached user embeddings, not bound by request-time latency
Online Model: A smaller, synchronous model that serves real-time requests using cached upstream embeddings and recent user actions
Transfer Ratio: A metric quantifying how much of the performance gain from the large upstream model is preserved when its output is used by the smaller online model
Pyramidal Reduction: An architectural optimization that selectively drops older tokens at deeper transformer layers to reduce computation