_comment: REQUIRED: Define ALL technical terms, acronyms, and method names used ANYWHERE in the entire summary. After drafting the summary, perform a MANDATORY POST-DRAFT SCAN: check every section individually (Core.one_sentence_thesis, evaluation_highlights, core_problem, Technical_details, Experiments.key_results notes, Figures descriptions and key_insights). HIGH-VISIBILITY RULE: Terms appearing in one_sentence_thesis, evaluation_highlights, or figure key_insights MUST be defined—these are the first things readers see. COMMONLY MISSED: PPO, DPO, MARL, dense retrieval, silver labels, cosine schedule, clipped surrogate objective, Top-k, greedy decoding, beam search, logit, ViT, CLIP, Pareto improvement, BLEU, ROUGE, perplexity, attention heads, parameter sharing, warm start, convex combination, sawtooth profile, length-normalized attention ratio, NTP. If in doubt, define it.
AD-DRL: Attribute-Driven Disentangled Representation Learning—the proposed method that uses attribute labels to supervise the separation of latent factors
Disentangled Representation Learning: Techniques to separate the underlying explanatory factors of data into disjoint parts of the representation
BPR: Bayesian Personalized Ranking—a pairwise ranking loss function widely used in recommendation systems
ViT: Vision Transformer—a model architecture for image processing that splits images into patches and processes them with transformers
BERT: Bidirectional Encoder Representations from Transformers—a transformer-based model for natural language processing
Multimodal features: Data from different sources or modes, such as text (reviews) and images (product photos)
Intra-modality disentanglement: Separating factors (like brand vs. price) within a single data type (e.g., text) using classifiers
Inter-modality disentanglement: Aligning representations of the same factor across different data types (e.g., ensuring 'brand' in text matches 'brand' in images) using contrastive loss
Softplus: A smooth approximation of the ReLU activation function, ensuring positive outputs
DRML: Disentangled Representation Learning for Multimodal Recommendation—a baseline method that uses attention to disentangle factors without explicit attribute supervision
MMGCN: Multimodal Graph Convolutional Network—a baseline method that builds a user-item graph for each modality