| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Bilingual alignment (English-Chinese) results showing AFP improves over base models and standard instruction tuning. | ||||
| XNLI (Average EN/ZH) | Accuracy | 41.55 | 47.30 | +5.75 |
| XCOPA (Average EN/ZH) | Accuracy | 54.35 | 57.70 | +3.35 |
| XNLI + XCOPA (Average) | Accuracy | 46.1 | 50.7 | +4.6 |
| FLORES-101 | COMET | 26.0 | 59.3 | +33.3 |
| 5 Datasets Avg | 0-shot Accuracy | 52.94 | 55.97 | +3.03 |