โ† Back to Paper List

Generating Query Recommendations via LLMs

Andrea Bacciu, Enrico Palumbo, Andreas Damianou, Nicola Tonellotto, Fabrizio Silvestri
arXiv (2024)
Recommendation Benchmark

๐Ÿ“ Paper Summary

Query Recommendation Generative Information Retrieval
GQR (Generative Query Recommendation) uses large language models to generate relevant search query suggestions without relying on historical query logs, outperforming traditional log-based commercial systems.
Core Problem
Existing query recommendation systems rely heavily on massive, private query logs to find patterns, making them ineffective for rare (long-tail) queries or for cold-start scenarios where no logs exist.
Why it matters:
  • Long-tail queries (rare searches) make up a huge portion of search traffic but have insufficient historical data for traditional log-based recommenders.
  • Query logs are proprietary and privacy-sensitive, preventing many researchers and smaller organizations from building effective recommendation systems.
  • Commercial systems often fail to generate any suggestions for rare inputs, leaving users without guidance.
Concrete Example: For a rare query appearing only once or twice in logs (e.g., specific long-tail searches in AOL data), commercial systems like 'System 1' and 'System 2' fail to generate suggestions 9-17% of the time. GQR (GPT-3) successfully generates 6 recommendations 100% of the time for these same queries.
Key Novelty
Generative Query Recommendation (GQR)
  • Replaces the traditional log-mining paradigm with a generative paradigm using Large Language Models (LLMs) like GPT-3.
  • Uses few-shot prompting to instruct the LLM to generate diverse and disambiguated query variations based solely on the input query, without needing a historical database.
Evaluation Highlights
  • Outperforms commercial 'System 2' by +23% (Robust04) and +27% (ClueWeb09B) in NDCG@10 when suggestions are used for query expansion.
  • Achieves ~59% user preference in a human evaluation study, significantly beating two commercial competitors (System 1 at ~26%, System 2 at ~15%).
  • 100% success rate in generating recommendations for long-tail/rare queries, whereas commercial baselines fail up to 17% of the time.
Breakthrough Assessment
7/10
Strong practical contribution demonstrating that LLMs can completely replace log-based systems for query recommendation, with superior performance on rare queries. However, the method relies on standard prompting rather than novel architectural changes.
×