RAG: Retrieval-Augmented Generation—AI systems that answer questions by first searching for relevant documents
RRF: Reciprocal Rank Fusion—a method for combining multiple ranked lists of search results into a single unified list
BM25: A ranking function used by search engines to estimate the relevance of documents to a given search query based on keyword matching
Hit@K: A metric indicating whether at least one relevant document appears in the top K retrieved results
reranking: A second stage of retrieval where a more powerful model (cross-encoder) re-scores a small set of candidate documents
cross-encoder: A transformer model that processes the query and document simultaneously to output a relevance score, more accurate but slower than bi-encoders
latency: The time delay between a user's request and the system's response
Jaccard similarity: A statistic used for gauging the similarity and diversity of sample sets (intersection over union)
truncation: Cutting off the list of retrieved documents to fit within the language model's maximum context window
KB: Knowledge Base—the collection of documents the system searches through
recall: The fraction of relevant documents that are successfully retrieved