| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Main results comparing SAFT against Standard SFT across different contamination ratios (lambda). | ||||
| Beavertails (λ=0.1) | Harmfulness Score (HS) | 18.2 | 8.5 | -9.7 |
| Beavertails | Harmfulness Reduction (Max) | Not reported in the paper | Not reported in the paper | -27.8% |
| Beavertails (λ=0.3) | Helpfulness (BLEURT) | 0.511 | 0.504 | -0.007 |