SID: Semantic ID—a sequence of discrete tokens representing an item, used to replace atomic item IDs in generative recommendation.
FAMAE: Field-Aware Masked Auto-Encoding—the proposed method for learning item embeddings by predicting masked feature fields (e.g., category, brand) from unmasked ones.
GAOQ: Globally Aligned Orthogonal Quantization—the proposed quantization method that aligns code indices globally to ensure consistent semantics across different hierarchical branches.
RQ-VAE: Residual Quantized Variational Autoencoder—a standard method for discretizing vectors into a sequence of codes by recursively quantizing residuals.
Prefix-conditional uncertainty: The uncertainty (entropy) of the next token in a sequence given the previous tokens; reducing this makes autoregressive generation easier.
Collaborative signals: Patterns derived from user interactions (e.g., users who buy X also buy Y), distinct from semantic similarity (X looks like Y).
Autoregressive modeling: Predicting a sequence one token at a time, where each prediction depends on previous ones.