KV Cache: A mechanism to store Key and Value states of previous tokens during LLM inference to avoid re-computing them at every step
Prefill Phase: The initial phase of LLM inference where the entire input prompt is processed in parallel to generate the initial KV cache
Decoding Phase: The sequential phase of LLM inference where tokens are generated one by one, attending to the cached keys and values
RoPE: Rotary Positional Embeddings—a method to encode token positions by rotating their vector representations, heavily used in modern LLMs like LLaMA
Pseudo Queries: Artificially constructed query vectors used to probe the importance of cached tokens; in DapQ, they are defined by future positional IDs rather than semantic content
Eviction: The process of removing less important Key-Value pairs from the cache to save memory
NIAH: Needle-In-A-Haystack—a benchmark testing an LLM's ability to retrieve a specific piece of information buried in a very long context
TTFT: Time-To-First-Token—the latency required to process the prompt and generate the first output token