| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Main results comparing DynaSearcher (7B) against baselines on standard multi-hop QA datasets. | ||||
| HotpotQA | F1 | 61.8 | 66.1 | +4.3 |
| 2WikiMultiHopQA | F1 | 67.1 | 72.0 | +4.9 |
| HotpotQA | F1 | 60.6 | 66.1 | +5.5 |
| Ablation studies validating the contributions of the Knowledge Graph (KG) and Multi-Reward (MR) components. | ||||
| HotpotQA | F1 | 61.8 | 66.1 | +4.3 |
| HotpotQA | F1 | 63.5 | 66.1 | +2.6 |