RAG: Retrieval-Augmented Generation—AI systems that enhance generation quality by retrieving external documents to use as context
RA-LLMs: Retrieval-Augmented Large Language Models—the specific application of RAG techniques to billion-parameter foundation models
Bi-encoder: A retrieval architecture where query and document are encoded separately by two encoders (often sharing weights) to compute similarity
Dense Retrieval: Retrieval based on semantic vector similarity (embeddings) rather than keyword matching
Sparse Retrieval: Retrieval based on exact keyword matching, such as TF-IDF or BM25
In-Context Learning (ICL): Providing examples or context in the prompt to guide the LLM's behavior without updating its weights
Hallucination: The generation of factually incorrect or nonsensical information by an LLM
Token Retrieval: Retrieving information at the granularity of individual tokens (rare patterns) rather than whole documents
Hypothetical Document Embedding (HyDE): A method where an LLM generates a fake 'hypothetical' document to answer a query, which is then used to retrieve real documents
Chain-of-Thought (CoT): Prompting strategy where the model generates intermediate reasoning steps before the final answer