| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| iCLP demonstrates strong generalization on out-of-domain mathematical and code datasets compared to base models. | ||||
| AIME 2024 + MATH-500 | Accuracy | Not explicitly reported in the paper | Not explicitly reported in the paper | +10% (average) |
| HumanEval + MBPP | Accuracy | Not explicitly reported in the paper | Not explicitly reported in the paper | +9% (average) |
| General Reasoning | Token Cost | Not explicitly reported in the paper | Not explicitly reported in the paper | -10% |