| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Calibration performance metrics comparing the vanilla model to the proposed calibrator-controlled pipeline. | ||||
| TriviaQA (Test) | Correctness of confident (<HI>) answers | 13.7 | 38.9 | +25.2 |
| TriviaQA (Test) | Overall Accuracy | 4.8 | 5.1 | +0.3 |
| TriviaQA (Test) | Percentage of answers generated confidently (<HI>) | 29.45 | 1.8 | -27.65 |
| TriviaQA (Test) | Expected Calibration Error (ECE) | Not reported in the paper | 0.018 | Not reported in the paper |