Cardinality: The number of unique values in a feature set; high cardinality (many unique values) can make learning difficult for ML models
Semantic IDs: Discrete item identifiers derived from content (e.g., via quantization), preserving semantic similarity in the ID space
RQ-VAE: Residual Quantized Variational AutoEncoder—a neural network used to compress high-dimensional data into discrete codes (used here as a baseline)
LLM Agents: LLMs wrapped in a control loop that allows them to reason, plan, and execute actions (like querying or refining) iteratively
Vocabulary Explosion: A failure mode where a generative model produces too many unique variations of a term (e.g., 'rock', 'rock music', 'classic rock'), diluting the signal
Inductive Bias: Assumptions built into a learning algorithm; here, error reports from annotators serve as inductive bias for the architect to refine the vocabulary