| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Experiments demonstrating the impact of COTTON-generated CoTs on the CodeT5+ 6B model across different benchmarks. | ||||
| HumanEval | pass@1 | 26.22 | 42.68 | +16.46 |
| HumanEval-plus | pass@1 | 26.83 | 43.90 | +17.07 |
| OpenEval | pass@1 | 20.22 | 35.39 | +15.17 |