REPL: Read-Eval-Print Loop—an interactive programming environment where the model can execute code and see results immediately
Context Rot: The phenomenon where LLM reasoning quality degrades steeply as the input prompt length increases, even within the allowed window
Symbolic Recursion: The ability of the model to write code that invokes the model itself on programmatically defined subsets of data
Context Compaction: A baseline strategy where context is repeatedly summarized (compressed) to fit within a model's window, often losing detail
S-NIAH: Single Needle-In-A-Haystack—a benchmark task requiring the retrieval of a specific small piece of information from a large text
OOLONG: A long-context reasoning benchmark requiring linear processing of the input (using nearly all entries)
OOLONG-Pairs: A variation of OOLONG requiring quadratic processing (aggregating pairs of chunks), extremely difficult for standard Transformers
CodeAct: An agent framework that allows LLMs to execute code actions; used here as a baseline
ReAct: Reasoning and Acting—a prompting paradigm where models generate reasoning traces before taking actions