| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Safety performance results showing massive gains for STAR-1 tuned models compared to base R1-Distill models across difficult benchmarks. | ||||
| WildJailbreak | Safety Rate | 41.6 | 77.0 | +35.4 |
| WildChat | Safety Rate | 63.7 | 85.1 | +21.4 |
| Reasoning performance results showing that STAR-1 preserves general capabilities unlike traditional safety tuning. | ||||
| Average (5 tasks) | Accuracy | 60.0 | 58.9 | -1.1 |
| Average (5 tasks) | Accuracy | 70.0 | 71.3 | +1.3 |