NDR: Narrative-Driven Recommendation—recommending items based on long, detailed natural language descriptions of user needs.
Mint: The proposed method: Data augMentation with INteraction narraTives—generating synthetic training queries from user history.
Bi-encoder: A retrieval model that encodes query and document separately into vectors, allowing fast approximate nearest neighbor search.
Cross-encoder: A re-ranking model that processes query and document together in a full attention mechanism, more accurate but slower than bi-encoders.
InstructGPT: A 175B parameter Large Language Model fine-tuned to follow instructions (used here for generating synthetic queries).
FlanT5: A smaller instruction-tuned model used here for filtering synthetic data via query likelihood.
NDCG: Normalized Discounted Cumulative Gain—a measure of ranking quality that accounts for the position of relevant items.
Query Likelihood: A scoring method estimating how likely a query is to be generated from a document model, used here for denoising synthetic data.