Bi-encoder: A retrieval model that encodes query and document independently into single vectors, allowing fast search but losing fine-grained interaction details
Cross-encoder: A model that processes query and document together, offering high accuracy but high computational cost
Late-interaction: A mechanism (like ColBERT) that encodes query and document independently but preserves token-level embeddings, delaying interaction until the final scoring step to balance speed and accuracy
ModernBERT: An updated BERT architecture optimized for longer context windows (up to 8,192 tokens) and efficiency
ColBERT: Contextualized Late Interaction over BERTβa retrieval model that sums maximum similarities between query and document token embeddings
Hard Negative Mining: Training strategy where the model is shown incorrect documents that are difficult to distinguish from the correct one (e.g., highly lexically similar but semantically different)
Recall@k: The percentage of queries where the correct document is found within the top k retrieved results
MIRAGE: A medical question-answering benchmark designed to test factuality and retrieval-grounded performance
MaxSim: Maximum Similarity operation used in ColBERT to find the best matching document token for each query token