← Back to Paper List

Recommendations by Concise User Profiles from Review Text

Ghazaleh Haratinezhad Torbati, Anna Tigunova, Andrew Yates, Gerhard Weikum
Max Planck Institute for Informatics
arXiv (2023)
Recommendation P13N Memory

📝 Paper Summary

User modeling Content-based recommendation
CUP improves recommendations for text-rich but data-poor users by distilling noisy review histories into concise 128-token profiles used to train a BERT-based two-tower retrieval model.
Core Problem
Standard collaborative filtering fails for long-tail users with sparse interactions, while feeding full review texts into LLMs is computationally expensive and suffers from low signal-to-noise ratios.
Why it matters:
  • Users in domains like books often have few ratings (sparse data) but write detailed reviews expressing diverse tastes, which current systems fail to leverage effectively
  • Feeding raw, long interaction histories into transformers incurs quadratic computational costs and dilutes the signal with irrelevant personal anecdotes found in reviews
  • Existing cold-start methods transfer knowledge for items but cannot easily transfer knowledge to long-tail users who possess unique, highly diverse interests
Concrete Example: A user review might mix helpful content cues ('the unusual murder weapon') with noise ('I read only on weekends') and sentiment ('fun to read'). Standard approaches ingest the noise, whereas CUP selects only the descriptive 'murder weapon' phrase to fit a strict 128-token budget.
Key Novelty
Concise User Profiles (CUP) as a Pre-computation Step
  • Decouples profile creation from the recommendation model: explicitly constructs a static, human-readable text profile (128 tokens) from massive review histories using selection heuristics or LLM summarization
  • Treats user text and item metadata as symmetric inputs in a two-tower architecture, enabling end-to-end fine-tuning of a small Language Model (BERT) on the condensed profiles
Breakthrough Assessment
7/10
Offers a practical, compute-efficient solution for the specific 'text-rich, data-poor' user segment. While architectural novelty is low (standard two-tower), the focus on concise profiling addresses a real bottleneck in LLM-based recsys.
×