← Back to Paper List

Beyond Random Augmentations: Pretraining with Hard Views

Fabio Ferreira, Ivo Rapant, Jörg K. H. Franke, Frank Hutter
University of Freiburg, ELLIS Institute Tübingen
arXiv (2023)
Pretraining MM

📝 Paper Summary

Self-Supervised Learning (SSL) Contrastive Learning Data Augmentation
HVP improves self-supervised learning by actively selecting the most challenging pairs of augmented views from a larger pool of random candidates during pretraining.
Core Problem
Standard self-supervised learning relies on completely random data augmentations, which may produce views that are too easy or not informative enough for the model to learn effective representations.
Why it matters:
  • Current random augmentation policies are suboptimal and do not adapt to the model's learning state.
  • Existing hard view methods often require complex auxiliary networks (like adversarial generators) or sensitive hyperparameter tuning.
  • Inefficient view selection slows down convergence and limits the final performance of representations on downstream tasks.
Concrete Example: In standard SimSiam, two random crops might overlap significantly or retain the most obvious features, making the matching task trivial. HVP generates four random crops and selects the specific pair that yields the highest loss, forcing the model to solve a harder recognition task.
Key Novelty
Hard View Pretraining (HVP)
  • Iteratively sample multiple random views (e.g., 4 instead of 2) for each image during pretraining.
  • Forward all views through the model to compute losses for all possible pairs.
  • Select the single pair with the highest loss (the 'hardest' view pair) for the backward pass, discarding the easier ones.
Evaluation Highlights
  • Achieves 78.8% linear evaluation accuracy on ImageNet-1k with DINO ViT-B/16 (400 epochs), surpassing the official baseline of 78.2%.
  • Consistently improves linear evaluation accuracy by ~1% on average across SimSiam, DINO, iBOT, and SimCLR (ResNet-50) for 100 and 300 epoch schedules.
  • Demonstrates transfer gains on object detection and segmentation, showing robust generalization beyond classification.
Breakthrough Assessment
7/10
Simple, plug-and-play method that consistently improves major SSL baselines without complex auxiliary networks. While the gains are moderate (~1%), the learning-free nature and broad applicability are significant.
×