| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| KG-Infused RAG consistently outperforms Vanilla RAG across multiple benchmarks. | ||||
| HotpotQA | F1 | 48.2 | 56.4 | +8.2 |
| 2WikiMQA | F1 | 40.5 | 58.3 | +17.8 |
| MuSiQue | F1 | 24.1 | 29.7 | +5.6 |
| Comparison against KG-based baselines shows KG-Infused RAG is superior to methods that construct ad-hoc KGs. | ||||
| HotpotQA | F1 | 45.1 | 56.4 | +11.3 |
| 2WikiMQA | F1 | 42.8 | 58.3 | +15.5 |
| Integration with advanced RAG systems demonstrates plug-and-play effectiveness. | ||||
| HotpotQA | F1 | 51.8 | 58.9 | +7.1 |