TRIME: Training with In-batch Memories—the proposed method that uses in-batch examples as dynamic memory during training
kNN-LM: k-Nearest Neighbor Language Model—a baseline that linearly interpolates LM predictions with retrieval from a datastore at test time
local memory: Tokens appearing in the immediate recent past (current segment)
long-term memory: Tokens from previous segments of the same document, usually inaccessible to standard attention due to length limits
external memory: A large collection of context-target pairs from the entire training corpus or a domain-specific corpus
BM25: Best Matching 25—a ranking function used to estimate the relevance of documents to a given search query
FAISS: Facebook AI Similarity Search—a library for efficient similarity search and clustering of dense vectors
perplexity: A measurement of how well a probability model predicts a sample; lower values indicate better performance
contrastive loss: A loss function that pulls positive pairs (matching context-target) together and pushes negative pairs apart in vector space
continuous cache: A mechanism storing hidden states of recent history to assist prediction via dot-product similarity