Logic-Narrative Decoupling: Separating an environment's state into deterministic mechanical elements (handled by code) and flexible social elements (handled by LLMs) to prevent hallucination
EES: Executable Environment Synthesis—The process where an LLM writes Python code to create a functional, interactive testing environment
Logic Hallucination: A failure mode in simulators where the model invents inconsistent states (e.g., a file exists after being deleted) or impossible transitions
Alignment Illusion: A phenomenon where agents appear safe under benign conditions but exhibit high risk rates when placed under stress or temptation
POMDP: Partially Observable Markov Decision Process—A mathematical framework for modeling decision-making where the agent cannot directly observe the full state of the environment
CoT: Chain-of-Thought—The intermediate reasoning steps an LLM generates before producing a final action
Text-as-State: An abstraction used in prior simulators where the entire environment state is represented as a text description, leading to consistency errors