← Back to Paper List

RecGOAT: Graph Optimal Adaptive Transport for LLM-Enhanced Multimodal Recommendation with Dual Semantic Alignment

Yuecheng Li, Hengwei Ju, Zeyu Song, Wei Yang, Chi Lu, Peng Jiang, Kun Gai
Kuaishou Technology, Fudan University, University of Southern California
arXiv (2026)
Recommendation MM P13N KG

📝 Paper Summary

Multimodal Recommendation LLM-Enhanced Recommendation Graph Neural Networks (GNN) for RecSys
RecGOAT bridges the semantic gap between Large Model knowledge and recommendation ID signals using dual-granularity alignment: instance-level contrastive learning and distribution-level optimal adaptive transport.
Core Problem
Existing methods fail to align the rich, general semantic knowledge of Large Models (LLMs/LVMs) with the sparse, specific ID-based signals used in collaborative filtering.
Why it matters:
  • Semantic heterogeneity prevents recommendation systems from fully leveraging the reasoning and world knowledge capabilities of Large Models
  • Current alignment techniques are limited to local instance-level matching, missing global distribution-level patterns essential for robust cross-modal understanding
  • Misaligned representations lead to suboptimal performance, particularly in large-scale scenarios where ID sparsity is high and semantic understanding is crucial
Concrete Example: A generative Large Model understands an item as a 'vintage leather jacket' with specific visual attributes, while a recommendation system sees it as 'ID: 49201' linked to user clicks. Without alignment, the model cannot transfer the semantic understanding of 'vintage' to similar items that 'ID: 49201' users might like but haven't clicked yet.
Key Novelty
Graph Optimal Adaptive Transport (RecGOAT)
  • Treats modal alignment as an optimal transport problem, moving the distribution of LLM-enhanced semantic features to match the distribution of collaborative ID features
  • Uses a dual-granularity approach: instance-level contrastive learning for local discriminability and distribution-level transport for global semantic consistency
  • Introduces adaptive learnable parameters into the transport matrix, allowing the geometric alignment to be fine-tuned by downstream recommendation tasks
Architecture
Architecture Figure Figure 2
The overall architecture of RecGOAT, illustrating the data flow from LLM-enhanced feature extraction to dual-granularity alignment and final prediction.
Evaluation Highlights
  • Outperforms state-of-the-art baselines on three public datasets (Amazon-Baby, Amazon-Sports, Tiktok) across Recall@20 and NDCG@20 metrics
  • Achieves 1.48% improvement in CTR and 1.63% in GMV in online A/B testing on a large-scale advertising platform
  • Consistent performance gains over LLM-based competitors like TALLRec and various multimodal GNN baselines
Breakthrough Assessment
7/10
Solid theoretical grounding with Optimal Transport for distribution alignment, backed by both extensive offline benchmarks and real-world industrial deployment. Addresses a critical semantic gap in current LLM-RecSys integration.
×