generation-calibrated: A property of a retriever where the score assigned to a document is proportional to how much that document improves the quality of the downstream LLM generation
cross-encoder: A retrieval architecture that processes the query and document simultaneously (concatenated) to output a relevance score, typically more accurate but computationally heavier than bi-encoders
KL divergence: Kullback-Leibler divergence—a statistical distance measuring how one probability distribution differs from a second, reference probability distribution
MPNet: Masked and Permuted Pre-training for Language Understanding—a transformer-based model used here as the backbone for the retriever
FlanT5: A T5 (Text-to-Text Transfer Transformer) model fine-tuned on instructions, used here as the auxiliary model to estimate generation likelihoods
LLM: Large Language Model—a massive AI model trained on vast text data to generate human-like text
RAG: Retrieval-Augmented Generation—a technique where an LLM is provided with external documents to improve its responses
bi-encoder: A retrieval architecture where queries and documents are encoded separately into vectors, allowing fast similarity search but often with lower accuracy than cross-encoders