Evaluation Setup
Speculative decoding on fine-tuned tasks (Math, etc.)
Benchmarks:
- Qwen2.5-Math-7B (Mathematical Reasoning)
Metrics:
- Average Acceptance Length (tau)
- Training Cost (relative to full retraining)
- Statistical methodology: Not explicitly reported in the paper
Key Results
| Benchmark |
Metric |
Baseline |
This Paper |
Δ |
| Qwen2.5-Math-7B |
Average Acceptance Length |
4.37 |
4.79 |
+0.42
|
| Qwen2.5-Math-7B |
Training Cost (%) |
100.0 |
60.8 |
-39.2
|
Main Takeaways
- EDA successfully restores and improves average acceptance length for fine-tuned target models compared to naive reuse of base draft models.
- The shared-private architecture allows for parameter-efficient adaptation, requiring updates to only a fraction of the parameters.
- Self-generation and data selection are critical: training on target-generated data aligns objectives, and selecting high-deviation samples improves data efficiency.