kNN-LM: A language model that interpolates probabilities from a neural network with probabilities derived from retrieving similar contexts from a training datastore
datastore: A key-value store where keys are vector representations of context and values are the subsequent target tokens
PCA: Principal Component Analysis—a technique to reduce the number of dimensions in data while retaining as much variation as possible
FAISS: A library for efficient similarity search and clustering of dense vectors
perplexity: A measurement of how well a probability model predicts a sample; lower values indicate better performance
inference overhead: The extra time and computation required to generate text compared to a standard model, often due to retrieval steps
parametric LM: A standard neural language model where knowledge is stored entirely in the model weights
non-parametric LM: A model that references external data (examples) at test time, explicitly memorizing training points