← Back to Paper List

LLM-Powered Nuanced Video Attribute Annotation for Enhanced Recommendations

Boyuan Long, Yueqi Wang, Hiloni Mehta, Mick Zomnir, Omkar Pathak, Changping Meng, Ruolin Jia, Yajun Peng, Dapeng Hong, Xia Wu, Mingyan Gao, Onkar Dalal, Ningren Han
Google
arXiv (2025)
Recommendation MM P13N

📝 Paper Summary

LLM-based Data Annotation Video Recommendation Systems
This paper presents an industrial pipeline that uses LLMs to generate nuanced video attribute annotations at scale via knowledge distillation, integrating them into recommendations through personalized restricted retrieval.
Core Problem
Traditional ML classifiers for video recommendation suffer from slow development cycles and fail to capture nuanced, subjective attributes (like 'vibes'), while human annotation is unscalable.
Why it matters:
  • Current systems miss subtle content cues (e.g., 'inspiring' vs. 'energetic'), limiting personalization quality
  • Feedback loops in recommendation systems require high-quality content understanding to be effective
  • The scale of platforms like YouTube (millions of videos/day) makes direct human or heavy-model annotation prohibitive
Concrete Example: A traditional classifier might tag a video simply as 'vlog', missing that it has a specific 'authentic' vibe. An initial LLM prompt might exclude this video due to heavy editing, requiring iterative refinement against a human 'Golden Set' to correctly identify the creator's genuine presentation.
Key Novelty
End-to-End LLM-as-Annotator Production Pipeline
  • Deploys an iterative 'LLM-as-annotators' workflow where LLMs generate 'Silver Set' labels for nuanced attributes (e.g., vibes) that are then distilled into lightweight student DNNs for massive scale
  • Integrates these annotations into online serving via 'Personalized Restricted Retrieval', where user intent triggers specific searches within the annotated attribute vocabulary
Evaluation Highlights
  • Gemini 2.5 Pro achieved 81.33% F1 score on nuanced attributes, significantly outperforming human crowd-sourced raters (63.21% F1)
  • Online A/B testing showed a +0.49% lift in user participation in content creation
  • Satisfied consumption increased by +0.21% in live production experiments
Breakthrough Assessment
8/10
Demonstrates a successful, large-scale industrial application of LLMs for subjective content annotation, showing LLMs can outperform humans on consistency and directly drive engagement metrics.
×