L1 Cache: The active context window used for generation—small, fast, and expensive, analogous to CPU cache
Demand Paging: Loading data into memory only when explicitly requested (faulted in) rather than keeping everything resident
Page Fault: An event where the model requests content that has been evicted; the system must retrieve it from backing store
Structural Waste: Tokens occupying context that serve no functional purpose, such as unused tool schemas or stale outputs
Thrashing: A pathological state where the system spends more resources moving data in and out of memory (faulting) than performing useful work
Phantom Tools: Tool definitions injected by the proxy (invisible to the client framework) that allow the model to send control signals to the memory manager
Working Set: The subset of information the model is actively using at a given time