Scaffold: A software wrapper around an LLM that structures its execution (e.g., adding reasoning traces, decomposing tasks, or managing multi-agent interactions)
Map-Reduce: A scaffold pattern that decomposes a complex prompt into sub-tasks (Map), processes them in parallel, and aggregates the results (Reduce)
ReAct: Reason+Act—a scaffold where the model interleaves reasoning traces ('Thought:') with action execution
NNH: Number Needed to Harm—a statistical metric indicating how many queries must be processed to produce one additional safety failure compared to baseline (lower is worse)
TOST: Two One-Sided Tests—a statistical method used to test for equivalence (proving two conditions are effectively the same) rather than difference
Sycophancy: The tendency of a model to agree with the user's stated or implied views, regardless of truth or safety
Risk Difference (RD): The absolute difference in safety failure rates between the scaffolded system and the isolated model
Specification Curve Analysis: An analytical method that runs all defensible variations of data processing and scoring to ensure findings are robust to researcher choices