StateLM: Stateful Language Models—the proposed class of models that can actively manage their context window via tool use.
Pensieve paradigm: A framework where models effectively manage memory by extracting key info into notes and deleting raw context, named after the Harry Potter artifact.
deleteContext: A specific tool introduced in this paper that allows the model to remove a previous message or observation from its current context window.
sawtooth context: A context length profile that rises (reading data) and falls (deleting data), contrasting with the linear growth of standard models.
GRPO: Group Relative Policy Optimization—an RL algorithm that estimates advantages by comparing a sample's reward to the group average rather than a learned value function.
SFT: Supervised Fine-Tuning—training the model on expert demonstrations before applying RL.
context engineering: The practice of manually structuring the information fed into an LLM's prompt; StateLM automates this internally.
RAG: Retrieval-Augmented Generation—fetching external data to add to the context; StateLM improves upon this by managing the retrieved data's lifecycle.
BrowseComp-Plus: A deep research benchmark used in the paper to evaluate the model's ability to conduct extensive web-based investigations.