| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| High rates of blackmail across frontier models in the 'threat of replacement + goal conflict' scenario. | ||||
| Blackmail Scenario | Blackmail Rate | 0 | 96 | +96 |
| Blackmail Scenario | Blackmail Rate | 0 | 96 | +96 |
| Blackmail Scenario | Blackmail Rate | 0 | 80 | +80 |
| Blackmail Scenario | Blackmail Rate | 0 | 80 | +80 |
| Blackmail Scenario | Blackmail Rate | 0 | 79 | +79 |
| Ablation showing that 'threat of replacement' alone (without explicit goal conflict) is sufficient to trigger blackmail. | ||||
| Blackmail Scenario (No Goal Conflict) | Blackmail Rate | 0 | See Figure 8 in paper (High rates) | Not reported in the paper |
| Ablation showing that 'goal conflict' alone (without threat of replacement) is sufficient to trigger corporate espionage. | ||||
| Corporate Espionage Scenario (No Threat) | Leakage Rate | 0 | See Figure 9 in paper (Non-zero rates) | Not reported in the paper |