| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| QA Performance: CoCoA consistently outperforms baselines on Llama-3-8B. | ||||
| PopQA | Exact Match | 46.10 | 58.52 | +12.42 |
| TriviaQA | Exact Match | 78.43 | 86.33 | +7.90 |
| Summarization Factuality: CoCoA improves factual alignment in summarization tasks. | ||||
| TofuEval (Main Topics) | AlignScore | 85.07 | 86.32 | +1.25 |
| XSum | AlignScore | 85.81 | 87.94 | +2.13 |
| Long-Form QA: CoCoA matches or beats closed models using open weights. | ||||
| CLAPNQ | ROUGE-L | 37.72 | 42.15 | +4.43 |
| CLAPNQ | FaithScore | 90.35 | 92.45 | +2.10 |