| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Main results on LongFact-Concepts showing FactAlign improves both factuality (F1) and helpfulness compared to the base model and standard alignment baselines. | ||||
| LongFact-Concepts | Factual F1 | 36.2 | 41.1 | +4.9 |
| LongFact-Concepts | Factual F1 | 39.5 | 41.1 | +1.6 |
| LongFact-Concepts | Factual Precision | 69.1 | 73.2 | +4.1 |
| FactScore-Bio | Factual Precision | 74.7 | 83.5 | +8.8 |
| FactScore-Bio | Number of Facts (Recall proxy) | 50.1 | 53.2 | +3.1 |
| Ablation studies demonstrating the specific contribution of the sentence-level fKTO loss. | ||||
| LongFact-Concepts | Factual F1 | 39.6 | 41.1 | +1.5 |