| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| LAMBADA results show FPTQ W4A8 maintains performance very close to the FP16 baseline across model scales. | ||||
| LAMBADA | Accuracy | 79.5653 | 78.7114 | -0.8539 |
| LAMBADA | Accuracy | 78.7891 | 78.7114 | -0.0777 |
| Common Sense QA (Avg) results demonstrate robustness, outperforming QAT methods in some cases. | ||||
| Common Sense QA Avg | Accuracy | 75.05 | 76.81 | +1.76 |
| Common Sense QA Avg | Accuracy | 74.48 | 73.42 | -1.06 |
| MMLU results show some degradation for smaller models but stability at scale. | ||||
| MMLU | Accuracy (Avg) | 44.14 | 40.96 | -3.18 |
| Common Sense QA Avg | Accuracy | 73.77 | 73.63 | -0.14 |