| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Results on the custom benchmark of 12,406 functions show significant improvements in repository-specific metrics (Dependency Coverage and Static Validity) across all three models. | ||||
| Custom Benchmark (12k functions) | Dependency Coverage | 39.8 | 55.4 | +15.6 |
| Custom Benchmark (12k functions) | Static Validity Rate | 41.6 | 65.6 | +24.0 |
| Custom Benchmark (12k functions) | Dependency Coverage | 45.9 | 60.3 | +14.4 |
| Results on CoderEval (176 tasks) measuring functional correctness via test cases. | ||||
| CoderEval | Pass@1 | 20.5 | 28.7 | +8.2 |
| CoderEval | Pass@1 | 36.4 | 45.5 | +9.1 |