| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Analysis of consistency shows that factual answers remain stable under prompt perturbation, while non-factual answers fluctuate significantly. | ||||
| HotpotQA | Consistency (Factual Group) | Not reported in the paper | 87.53 | Not reported in the paper |
| HotpotQA | Consistency (Non-Factual Group) | Not reported in the paper | 45.19 | Not reported in the paper |
| NQ-open | Consistency (Factual Group) | Not reported in the paper | 90.3 | Not reported in the paper |
| NQ-open | Consistency (Non-Factual Group) | Not reported in the paper | 43.28 | Not reported in the paper |