← Back to Paper List

Embedding in Recommender Systems: A Survey

Maolin Wang, Xinjian Zhao, Wanyu Wang, Sheng Zhang, Jiansheng Li, Bowen Yu, Binhao Wang, Shucheng Zhou, Dawei Yin, Qing Li, Ruocheng Guo, Xiangyu Zhao
City University of Hong Kong, Baidu Inc., Hong Kong Polytechnic University, Unaffiliated Researcher
arXiv (2023)
Recommendation KG P13N

📝 Paper Summary

Recommender Systems Representation Learning
This survey provides a comprehensive taxonomy of embedding techniques in recommender systems, categorizing methods into matrix, sequential, and graph-based structures while addressing scalability through AutoML and quantization.
Core Problem
High-dimensional discrete features (like user and item IDs) in recommender systems are sparse and computationally expensive to process directly, making it difficult to capture complex entity relationships effectively.
Why it matters:
  • Sparse data environments lead to poor recommendation performance (cold-start problem) if relationships aren't densified
  • Scalability becomes a critical bottleneck as the number of users and items grows, rendering traditional extensive training methods inefficient
  • A unified framework is needed to navigate the evolution from simple Matrix Factorization to complex Graph and LLM-based approaches
Concrete Example: In a movie recommendation setup with millions of users and movies, a user rating matrix is 99% empty (sparse). Simple matrix factorization struggles to predict preferences for new users with few ratings. Embedding techniques solve this by mapping these sparse IDs to dense vectors, but selecting the right architecture (MF vs. FM vs. Graph) is complex.
Key Novelty
Systematic Taxonomy of RS Embeddings
  • Categorizes embedding approaches into three structural domains: Matrix-based (CF/MF), Sequential (RNNs/Transformers), and Graph-based (node2vec/GNNs)
  • Integrates efficiency-focused methodologies like AutoML, Hashing, and Quantization directly into the embedding taxonomy, addressing the 'how' of deployment alongside the 'what' of modeling
  • Identifies the emerging role of Large Language Models (LLMs) in enhancing semantic understanding for embeddings
Breakthrough Assessment
8/10
Provides a crucial structured overview of a massive, fragmented field. While it is a survey and not a new method, its taxonomy and inclusion of efficiency techniques (AutoML/Quantization) make it a high-value resource.
×