Is-Ought Gap: Hume's principle that normative conclusions (what should be) cannot be logically derived entirely from descriptive premises (what is, e.g., human preference data)
Value Pluralism: Berlin's theory that human values are distinct and incommensurable (cannot be ranked on a single scale), meaning no single utility function can represent them without loss
Extended Frame Problem: The challenge that any static value encoding will inevitably misfit future contexts because the AI's own operations create new ethical categories (concept drift) not present during training
Content-Based Alignment: Any approach that attempts to specify values as a formal object (reward function, utility function, constitutional text) and optimize a model toward it
RLHF: Reinforcement Learning from Human Feedback—aligning models by training a reward model on human preference rankings
Constitutional AI: An alignment method where models are trained to follow a set of natural language principles (a 'constitution') via self-critique and revision
IRL: Inverse Reinforcement Learning—inferring a reward function by observing an agent's (or human's) behavior
Specification Trap: The conjunction of the is-ought gap, value pluralism, and the extended frame problem, which together prevent robust value specification