From Prompting to Alignment: A Generative Framework for Query Recommendation

📝 Paper Summary

Query Recommendation Generative Search

GQR unifies query recommendation tasks into a single generative framework that aligns LLMs with user click preferences via a CTR-based reward model and grounds generation in retrieved co-occurrence patterns.

Core Problem

Traditional query recommendation relies on sparse historical logs, failing on cold-start queries, while existing LLM approaches generate semantically plausible but often unclickable queries due to a lack of alignment with real user feedback.

Why it matters:

Sparsity in historical interactions makes conventional methods ineffective for long-tail or new queries
Existing solutions are siloed (separate models for suggestion vs. completion), limiting generalization to new contexts like conversational search
Without aligning to click signals, LLMs may produce hallucinations or irrelevant suggestions that degrade user experience in commercial search engines

Concrete Example: In a cold-start scenario where a user types a novel query, a log-based system returns nothing due to zero co-occurrence data. A standard LLM might generate a grammatically correct but irrelevant query based on internal knowledge. GQR aims to generate a query that is both semantically relevant and statistically likely to be clicked.

Key Novelty

Generative Query Recommendation (GQR) with CTR-Alignment

Unifies diverse tasks (suggestion, completion, clarification) under one generative prompt template rather than separate specialized models
Treats the Click-Through Rate (CTR) predictor as a Process Reward Model (PRM) to guide the LLM via Direct Preference Optimization (DPO), ensuring outputs match user preferences
Augments the LLM with 'User Initiative' by retrieving co-occurrence queries as side information, bridging the gap between the model's internal knowledge and proactive search patterns

Architecture

The overall learning framework of GQR, illustrating the cycle of SFT, CTR Alignment, and Periodic Updates.

Evaluation Highlights

Achieves up to 60%+ improvement in CTR (Click-Through Rate) compared to LLM baselines in Baidu's conversational search services
Unified framework successfully deployed across three distinct scenarios (suggestion, completion, clarification) within a large-scale commercial system

Breakthrough Assessment

7/10

Novel application of alignment (DPO) specifically for CTR maximization in query recommendation. Significant commercial deployment claims (60% CTR boost), though detailed experimental breakdowns are missing from the provided text.

⚙️ Technical Details

Problem Definition

Setting: Generative query recommendation where an LLM generates a list of candidate queries given a user input and context

Inputs: User query input text q, optional side information S (e.g., session history, retrieved co-occurrence queries)

Outputs: A list of recommended queries RQ = {q1, q2, ..., qN} and optional auxiliary texts T

Pipeline Flow

Data Preparation: Co-Occurrence Retrieval → Prompt Construction
Generation: Aligned LLM generates candidate list
Feedback Loop (Training): Impression Logs → CTR Predictor → DPO Update

System Modules

Co-Occurrence Retriever

Retrieves historical user-initiated queries associated with the current input to represent proactive search intent

Model or implementation: ERNIE-based semantic matching model trained with SimCSE

Generative LLM

Generates a list of recommended queries and auxiliary text based on the prompt and side information

Model or implementation: Pre-trained LLM (e.g., 7B or 13B parameters)

CTR Predictor

Estimates the click probability of a generated query to serve as a reward signal for alignment

Model or implementation: BERT-based encoder with 2-layer MLP head

Novel Architectural Elements

Integration of a BERT-based CTR predictor as a Process Reward Model (PRM) directly into the LLM alignment loop
Injection of retrieved co-occurrence queries as 'side information' within a universal generative prompt template to align with proactive search intent

Modeling

Base Model: Pre-trained LLM (specific variant like Llama not explicitly named, but sized 7B or 13B)

Training Method: Iterative Direct Preference Optimization (DPO)

Objective Functions:

Purpose: Minimize difference between predicted and actual clicks.

Formally: Binary Cross Entropy loss on impression logs for the CTR predictor.
Purpose: Maximize likelihood of preferred (high-CTR) responses while keeping rejected (low-CTR) likelihood low.

Formally: DPO loss L_DPO(π_θ; π_ref) minimizing -log σ(β * log(π_θ(y_c|x)/π_ref(y_c|x)) - β * log(π_θ(y_r|x)/π_ref(y_r|x)))

Training Data:

SFT Data: Annotated by experts or large LLMs (GPT)
CTR Data: 14 days of impression click logs from online system
Preference Pairs: Constructed by scoring SFT outputs with CTR predictor; 'Chosen' = high cumulative CTR, 'Rejected' = low cumulative CTR (length-balanced)

Key Hyperparameters:

delta: Predefined threshold for stopping iterative DPO (value not specified)
beta: KL penalty coefficient in DPO (implied, value not specified)

Compute: Not reported in the paper

Comparison to Prior Work

vs. CQR: GQR uses generative LLMs to handle cold-start/long-tail queries where logs are sparse
vs. LLM-based Recommendation: GQR aligns generation directly with user clicks (CTR) using DPO, whereas standard LLM methods rely on inherent knowledge or static few-shot examples
vs. Rank-based methods: GQR generates queries token-by-token rather than retrieving and ranking existing candidates

Limitations

Dependency on large-scale online impression logs for training the CTR predictor
Risk of overlapping/redundant queries when maximizing CTR (mitigated by GPT filtering, but still a concern)
Requires complex periodic updates to keep co-occurrence data and CTR models fresh
Computational cost of invoking LLMs for every query generation in an online system

Reproducibility

No replication artifacts mentioned in the paper. The system is deployed in a proprietary commercial search engine (Baidu). Training data (search logs) is private.

📊 Experiments & Results

Evaluation Setup

Deployed in Baidu's conversational search services; offline and online evaluation

Benchmarks:

Baidu Search Logs (Query Recommendation (Suggestion, Completion, Clarification)) [New]

Metrics:

CTR (Click-Through Rate)
User Experience (qualitative)
Statistical methodology: Not explicitly reported in the paper

Main Takeaways

The GQR system achieves over 60% improvement in CTR compared to baseline LLM approaches when deployed in a commercial setting.
Aligning LLMs with a CTR predictor (via DPO) effectively bridges the gap between semantic plausibility and user utility.
Injecting co-occurrence queries as side information helps ground the LLM in proactive search intents, mitigating the disconnect between the model and user behavior.
The framework generalizes across three distinct tasks: query suggestion, completion, and clarification.

📚 Prerequisite Knowledge

Prerequisites

Understanding of Query Recommendation tasks (completion, suggestion)
Reinforcement Learning from Human Feedback (RLHF) concepts
Knowledge of BERT and LLM architectures

Key Terms

GQR: Generative Query Recommendation—the proposed framework treating recommendation as a conditional generation task aligned with user preferences

CTR: Click-Through Rate—the ratio of users who click on a specific link to the number of total users who view a page, used here as the primary reward signal

DPO: Direct Preference Optimization—an algorithm for aligning language models to preferences without an explicit reward model loop, using pairs of preferred/rejected outputs

PRM: Process Reward Model—a model that provides feedback on intermediate steps or specific components of generation (here, the CTR predictor acting as a reward model)

SFT: Supervised Fine-Tuning—training the model on labeled examples before alignment

Co-occurrence Retrieval: Finding queries that frequently appear together in historical search logs to capture user search patterns

SimCSE: Simple Contrastive Learning of Sentence Embeddings—a contrastive learning framework used here to train the query retrieval model

ERNIE: Enhanced Representation through Knowledge Integration—a pre-trained language model architecture used as the backbone for the semantic matching model