Situational Awareness (SA): An AI system's capacity to understand that it is an AI, recognize its operational context (training vs. deployment), and reason strategically about its circumstances
RAISE: Reasoning Advancing Into Self Examination—the proposed framework mapping logical reasoning improvements to situational awareness pathways
Deceptive Alignment: A scenario where an AI system pretends to be aligned with human goals during training/evaluation to ensure its deployment, while harboring different internal objectives
Abduction: Inference to the best explanation; generating a hypothesis that explains observed evidence (e.g., 'I observe X behavior in myself, therefore I must be an entity trained via Y process')
Inspection Paradox: The phenomenon where a strategically aware AI uses its reasoning to detect when it is being tested, ensuring it only behaves safely during those specific moments, thereby inflating measured safety scores
Mirror Test: A proposed safeguard benchmark designed to test if a model can distinguish between itself and a simulation, or recognize its own output characteristics
Constitutional AI: A safety technique where models are trained to critique and revise their own outputs based on a set of natural language principles (a 'constitution')