| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Performance on News Chat (DSTC7) shows improvements using consolidated knowledge and feedback. | ||||
| DSTC7 | Knowledge F1 (KF1) | 26.71 | 36.41 | +9.70 |
| Performance on Customer Service (DSTC11) demonstrates reduced hallucination. | ||||
| DSTC11 | Knowledge F1 (KF1) | 31.33 | 37.41 | +6.08 |
| DSTC11 | Knowledge F1 (KF1) | 34.07 | 37.41 | +3.34 |
| Open-domain QA (Wiki QA) results highlight the necessity of knowledge consolidation for multi-hop tasks. | ||||
| Wiki QA | F1 | 0.59 | 11.80 | +11.21 |
| Wiki QA | F1 | 2.38 | 8.08 | +5.70 |