Evaluation Setup
Pre-train on fine-grained source dataset, fine-tune on coarse-grained target dataset.
Benchmarks:
- ImageNet21k to ImageNet1k (Image Classification Transfer)
- iNaturalist 2021 (Intra-dataset Transfer (Fine to Superclass))
Metrics:
- Top-1 Validation Accuracy
- Validation Error
- Statistical methodology: Not explicitly reported in the paper
Key Results
| Benchmark |
Metric |
Baseline |
This Paper |
Δ |
| Results on ImageNet21k to ImageNet1k transfer using ViT-B/16 demonstrate that pre-training on finer labels (leaf nodes) yields the best downstream accuracy. |
| ImageNet1k |
Top-1 Validation Accuracy |
77.91 |
82.51 |
+4.60
|
| ImageNet1k |
Top-1 Validation Accuracy |
77.91 |
81.28 |
+3.37
|
| ImageNet1k |
Top-1 Validation Accuracy |
77.91 |
80.26 |
+2.35
|
| ImageNet1k |
Top-1 Validation Accuracy |
77.91 |
72.75 |
-5.16
|
Main Takeaways
- Label granularity matters: Pre-training on leaf labels (finest) consistently outperforms coarser levels and baselines.
- Coarse pre-training can be harmful: Using very coarse pre-training labels (e.g., 38 classes for ImageNet21k) performs worse than training from scratch (baseline).
- Hierarchy alignment is critical: On iNaturalist, manual hierarchies outperform clustering-based labels, showing that semantic alignment between source and target labels is necessary for effective transfer.
- Granularity has a 'sweet spot': Performance follows a U-shaped curve; unique-label-per-sample (extreme fine-grained) fails, as does extreme coarse-grained.