RAG: Retrieval-Augmented Generation—enhancing LLMs by retrieving relevant external data during generation
LLM: Large Language Model—a deep learning algorithm that can recognize, summarize, translate, predict, and generate text
Entropy: A measure of the uncertainty or unpredictability in the model's next-token prediction distribution
Self-attention: Mechanism in Transformers relating different positions of a single sequence to compute a representation of the sequence
BM25: Best Matching 25—a ranking function used by search engines to estimate the relevance of documents to a given search query
Greedy decoding: A generation strategy where the model always picks the single most likely next token
F1 score: A metric measuring the accuracy of the generated answer by balancing precision and recall against the ground truth
EM: Exact Match—a metric measuring if the generated answer exactly matches the ground truth
Stopwords: Common words (like 'the', 'is', 'at') filtered out because they carry little semantic meaning
CoT: Chain-of-Thought—a prompting technique encouraging the model to generate intermediate reasoning steps
Dense retrieval: Retrieval based on semantic vector embeddings rather than keyword matching