LVLM: Large Vision-Language Model—a model capable of processing both images and text, typically pre-trained on massive datasets
LoRA: Low-Rank Adaptation—a technique to fine-tune large models by freezing original weights and training small, low-rank decomposition matrices
Gradient Conflict: A phenomenon where updates from different objectives (or modalities) pull model parameters in opposing directions, hindering convergence
Hit@10: A metric measuring the proportion of times the correct item appears in the top 10 recommendations
NDCG@10: Normalized Discounted Cumulative Gain—a ranking metric that accounts for the position of correct items in the top 10
Long-tail items: Items with very few user interactions, making them hard to recommend using collaborative filtering alone
Intra-modal structure: The similarity relationships between items within a single modality (e.g., how similar item A's text is to item B's text)
CMSA: Cross-Modal Structural Alignment—SDA's component for aligning modalities using intra-modal structure as a teacher
MoDA: Modality-Disentangled Adaptation—SDA's component for routing visual and textual updates through different low-rank experts