| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Main results demonstrating Tok-RAG's performance against baselines across different LLM backbones on QA datasets. | ||||
| Natural Questions | EM | 44.3 | 46.1 | +1.8 |
| TriviaQA | EM | 73.2 | 74.5 | +1.3 |
| PopQA | EM | 48.5 | 50.2 | +1.7 |
| Natural Questions | EM | 53.4 | 55.1 | +1.7 |