| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Results on scientific reasoning tasks (PubmedQA, BioASQ, ProcessBank) using a restricted 135-node UMLS KG. GIVE consistently outperforms baselines, and smaller models with GIVE often beat larger models without it. | ||||
| PubmedQA | Accuracy | 75.6 | 78.2 | +2.6 |
| ProcessBank | Accuracy | 71.6 | 74.8 | +3.2 |
| CommonsenseQA | Accuracy | 68.2 | 73.1 | +4.9 |
| CommonsenseQA (10% KG) | Accuracy | 64.1 | 69.5 | +5.4 |
| TruthfulQA | Win Rate % | 32.7 | 50.3 | +17.6 |