Evaluation Setup
Code generation on unseen competitive programming problems
Benchmarks:
- LiveCodeBench v6 (Competitive Programming (strictly unseen))
- AtCoder (Competitive Programming)
- LeetCode (Coding Interview Problems)
Metrics:
- Accuracy (Pass rate)
- Pass@1 (implied by binary score description)
- Statistical methodology: Not explicitly reported in the paper
Key Results
| Benchmark |
Metric |
Baseline |
This Paper |
Δ |
| Difficulty filtering significantly alters the dataset composition, shifting the distribution towards harder problems. |
| MicroCoder construction |
Easy Problem Ratio |
40 |
20 |
-20
|
| MicroCoder construction |
Dataset Reduction |
100 |
70 |
-30
|
Main Takeaways
- Difficulty-aware curation (MicroCoder) yields 3x larger performance gains within the first 300 training steps compared to baselines, indicating much higher training efficiency.
- The dataset achieves up to 17.2% relative gains on medium and hard problems, validating that removing easy data helps models generalize better to difficult tasks.
- Privately collected datasets (part of MicroCoder) contain more test cases and show distinct clusters compared to open-source sets like TACO, providing complementary coverage.