GRIT: Generative Representational Instruction Tuning—a method to train LLMs for both text generation and embedding capabilities simultaneously
MTEB: Massive Text Embedding Benchmark—a comprehensive suite of datasets for evaluating text embedding models
in-batch negatives: A contrastive learning technique where other samples in the same training batch serve as negative examples for a given query-document pair
bidirectional attention: Attention mechanism where tokens can attend to both past and future tokens (unlike causal attention which only looks back)
causal attention: Attention mechanism where tokens can only attend to previous tokens, standard in generative models like GPT
RAG: Retrieval-Augmented Generation—AI systems that answer questions by first searching for relevant documents
BF16: Bfloat16—a floating-point format that preserves the dynamic range of 32-bit floats but with lower precision, used to speed up training
Bi-Encoder: An architecture where query and document are encoded separately into vectors, allowing fast retrieval via dot product
Cross-Encoder: An architecture where query and document are processed together by the model to output a relevance score, more accurate but computationally expensive
KTO: Kahneman-Tversky Optimization—an alignment tuning method for language models based on human utility functions