| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Performance on AmbigQA (Web Search Setting) | ||||
| AmbigQA | F1 | 51.15 | 53.25 | +2.10 |
| AmbigQA | F1 | 53.64 | 54.14 | +0.50 |
| Performance on PopQA (Web Search Setting) | ||||
| PopQA | F1 | 50.15 | 54.34 | +4.19 |
| Performance on HotpotQA (Local Dense Retrieval Setting) | ||||
| HotpotQA | F1 | 39.52 | 41.67 | +2.15 |
| Cost and Latency Efficiency (HotpotQA) | ||||
| HotpotQA | Latency (s/query) | 2.37 | 1.34 | -1.03 |
| HotpotQA | Cost ($/1k queries) | 1.25 | 0.35 | -0.90 |