| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Main persuasion results comparing treatment conditions against the Human-Human baseline. | ||||
| Custom Debate Task | Odds of higher agreement | 1.0 | 1.817 | +0.817 |
| Custom Debate Task | Odds of higher agreement | 1.0 | 1.213 | +0.213 |
| Custom Debate Task | Odds of higher agreement | 1.0 | 0.826 | -0.174 |
| Custom Debate Task | AI Detection Rate | 0.50 | 0.75 | +0.25 |