AQE: Aligned Query Expansion—the proposed method of fine-tuning LLMs to generate retrieval-optimized query expansions
BM25: Best Matching 25—a probabilistic information retrieval function used to rank documents based on query terms
DPO: Direct Preference Optimization—an alignment method that optimizes a model to prefer one response over another without an explicit reward model
RSFT: Rejection Sampling Fine-Tuning—a method where the model is fine-tuned only on the best outputs sampled from its own previous generations
RLHF: Reinforcement Learning from Human Feedback—a technique to align LLMs using a reward model trained on human preferences
GAR: Generation-Augmented Retrieval—a baseline method that expands queries by generating relevant contexts like answers or titles
EAR: Expand and Rerank—a baseline method that generates multiple query expansions and uses a reranker to select the best one
BoN: Best-of-N—a decoding strategy that samples N responses and selects the best one based on a reward model
zero-shot prompting: Asking an LLM to perform a task without providing any specific training examples in the prompt
hallucination: When an LLM generates content that is factually incorrect or irrelevant to the source material
vocabulary mismatch: The problem where terms in a user's query do not literally match the terms in the relevant documents