| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Performance comparison using LLaMA3-8B-Instruct as the base model. | ||||
| PopQA | Accuracy | 57.4 | 63.8 | +6.4 |
| Natural Questions (NQ) | Accuracy | 47.7 | 58.3 | +10.6 |
| TriviaQA | Accuracy | 78.4 | 87.0 | +8.6 |
| Performance comparison using LLaMA2-hf-7b as the base model. | ||||
| Natural Questions (NQ) | Accuracy | 29.3 | 52.6 | +23.3 |
| RGB | Accuracy | 48.2 | 53.9 | +5.7 |
| Efficiency comparison showing RPO matches standard RAG speed. | ||||
| Inference Efficiency | LLM Calls | Multiple (Adaptive) | 1 | Reduced |