GQR: Generative Query Recommendation—the proposed framework treating recommendation as a conditional generation task aligned with user preferences
CTR: Click-Through Rate—the ratio of users who click on a specific link to the number of total users who view a page, used here as the primary reward signal
DPO: Direct Preference Optimization—an algorithm for aligning language models to preferences without an explicit reward model loop, using pairs of preferred/rejected outputs
PRM: Process Reward Model—a model that provides feedback on intermediate steps or specific components of generation (here, the CTR predictor acting as a reward model)
SFT: Supervised Fine-Tuning—training the model on labeled examples before alignment
Co-occurrence Retrieval: Finding queries that frequently appear together in historical search logs to capture user search patterns
SimCSE: Simple Contrastive Learning of Sentence Embeddings—a contrastive learning framework used here to train the query retrieval model
ERNIE: Enhanced Representation through Knowledge Integration—a pre-trained language model architecture used as the backbone for the semantic matching model