| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Comparison against vanilla fine-tuning across multiple PLMs shows consistent improvements, especially on smaller datasets. | ||||
| RTE | Accuracy | 78.49 | 84.47 | +5.98 |
| RTE | Accuracy | 78.58 | 81.65 | +3.07 |
| GLUE Avg | Average Score | 82.60 | 83.65 | +1.05 |
| GLUE Avg | Average Score | 82.66 | 83.56 | +0.90 |
| RTE | Accuracy | 70.39 | 71.11 | +0.72 |