| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Performance in the Monolingual Retrieval setting (Non-English Question, English Documents) shows significant degradation compared to English baselines. | ||||
| XRAG-Monolingual (Average) | Accuracy | 75.40 | 55.50 | -19.90 |
| XRAG-Monolingual (Average) | Accuracy | 56.40 | 31.40 | -25.00 |
| Response Language Correctness (RLC) is a major failure mode in Monolingual Retrieval settings. | ||||
| XRAG-Monolingual (Average) | Wrong Language % | 0.00 | 1.30 | +1.30 |
| XRAG-Monolingual (Average) | Wrong Language % | 0.00 | 61.10 | +61.10 |
| Controlled analysis in Multilingual Retrieval reveals that reasoning over cross-lingual documents is harder than generation. | ||||
| XRAG-Multilingual (Analysis) | Accuracy | 57.58 | 67.08 | +9.50 |
| XRAG-Multilingual (Analysis) | Accuracy | 57.58 | 58.05 | +0.47 |