MSR: Multimodal Sequential Recommendation—systems that use both visual and textual data to predict user preferences
LVLM: Large Vision Language Model—models like GPT-4V capable of processing and reasoning about both images and text
SASRec: Self-Attentive Sequential Recommendation—a standard baseline model using attention mechanisms to capture sequential patterns
H@k: Hit Ratio at k—the percentage of times the ground-truth item appears in the top-k recommended items
N@k: Normalized Discounted Cumulative Gain at k—a ranking metric that accounts for the position of the correct item in the list
Reranker: A strategy where a model re-orders a candidate list generated by another retrieval system, rather than searching the entire catalog itself
Item Enhancer: A strategy using LVLMs to generate rich textual descriptions (captions) from item images to augment metadata
Hallucination: When a generative model recommends items that do not exist or are not in the valid candidate list