RAG: Retrieval-Augmented Generation—AI systems that answer questions by first searching for relevant documents, then generating responses based on what they find
BM25: Best Matching 25—a ranking function used by search engines to estimate the relevance of documents to a given search query based on keyword matching
REINFORCE: A basic policy gradient reinforcement learning algorithm that updates model parameters to maximize expected rewards
Knowledge Distillation: A technique where a smaller or target model is trained to reproduce the behavior (output probabilities) of a larger or source model
Bi-encoder: A retrieval architecture that encodes query and document separately into vectors
Cross-encoder: A retrieval architecture that processes query and document simultaneously to capture deeper interactions, used here for the RAG retriever
T5: Text-to-Text Transfer Transformer—a pre-trained language model that treats all NLP tasks as a text generation problem
PPL: Perplexity—a measurement of how well a probability model predicts a sample; lower values indicate better performance
MIPS: Maximum Inner Product Search—an algorithm used to quickly find the vector in a database that is most similar (highest dot product) to a query vector