Semantic IDs: Discrete token representations derived from item content features (like audio or lyrics) via quantization, allowing LLMs to generate item identifiers directly
RVQ: Residual Vector Quantizer—a method to compress high-dimensional vectors into discrete codes (Semantic IDs) by iteratively quantizing the residual error
BM25: Best Matching 25—a probabilistic information retrieval function that ranks documents based on the query terms appearing in each document
BPR: Bayesian Personalized Ranking—an optimization criterion for personalized recommendation that focuses on the relative order of items (ranking) rather than absolute ratings
CLAP: Contrastive Language-Audio Pretraining—a model that learns joint embeddings for audio and text, enabling text-to-audio retrieval
Hit@K: A metric measuring the proportion of times the correct (ground truth) item appears in the top K recommendations
SigLIP: Sigmoid Loss for Language Image Pre-training—a multimodal model connecting images and text, used here for image-based music retrieval (e.g., album art)