Iterative RAG: A generation process where the model retrieves new documents multiple times during the generation of a single response (e.g., every sentence or token)
KNN-LM: K-Nearest Neighbor Language Model—a token-level iterative RAG method that interpolates the LM's next-token distribution with a distribution from retrieved nearest neighbors
Speculative Retrieval: Predicting the result of a retrieval operation (using a local cache) to proceed with generation, postponing the actual expensive retrieval
Batched Verification: Checking the validity of multiple speculative steps simultaneously by sending a group of queries to the external retriever in parallel
Speculation Stride: The number of consecutive speculative steps performed before triggering a verification step
Prefetching: Populating the local cache with extra documents (top-k instead of top-1) during verification to increase the cache hit rate for future speculations
Spatial Locality: The tendency for consecutive retrieval queries to access adjacent documents in the knowledge base (relevant for KNN-LM)
Temporal Locality: The tendency for consecutive retrieval queries to access the exact same document repeatedly