← Back to Paper List

Uncovering a Winning Lottery Ticket with Continuously Relaxed Bernoulli Gates

Itamar Tsayag, Ofir Lindenbaum
arXiv (2026)
Pretraining MM

📝 Paper Summary

Neural Network Pruning Sparse Neural Networks
This paper proposes a method to discover strong lottery tickets—sparse subnetworks that perform well without weight training—by optimizing continuously relaxed Bernoulli gates while keeping the original network weights frozen.
Core Problem
Finding 'strong lottery tickets' (sparse subnetworks that work well without training) currently relies on the Edge-Popup algorithm, which uses non-differentiable score-based selection, leading to inefficient optimization and poor scalability.
Why it matters:
  • Over-parameterized models incur prohibitive memory and computational costs, limiting deployment on resource-constrained devices
  • Current methods like Edge-Popup struggle to scale to larger architectures due to reliance on non-differentiable gradient estimators
  • Efficiently finding strong lottery tickets could allow high-performance inference using only a fraction of a model's parameters without ever training the weights
Concrete Example: When using Edge-Popup to find a strong lottery ticket in a ResNet50, the algorithm must use a non-differentiable estimator to select edges based on scores, resulting in only ~50% sparsity for good accuracy. The proposed method uses differentiable gates to achieve >90% sparsity at comparable accuracy.
Key Novelty
Continuously Relaxed Bernoulli Gates for Strong Lottery Tickets
  • Applies a learnable mask (gate) to every weight in a randomly initialized network, where the mask values are drawn from a continuous relaxation of the Bernoulli distribution
  • Allows standard gradient descent to optimize the probability of each weight being active, even though the weights themselves are never updated
  • Enables end-to-end differentiable optimization of the network structure (sparsity) alongside an L0 regularization term, avoiding the need for straight-through estimators
Evaluation Highlights
  • Achieves 91.5% sparsity on ResNet50 (CIFAR-10) with 83.1% accuracy, nearly double the sparsity of Edge-Popup at comparable performance
  • Discovers the first known Strong Lottery Tickets for Vision Transformers (ViT-base), retaining 90% sparsity with 76% accuracy without weight training
  • Outperforms prior strong lottery ticket methods on LeNet-300-100 by 11 percentage points in accuracy (96% vs 85%)
Breakthrough Assessment
8/10
Significantly improves upon the standard Edge-Popup algorithm by making the process differentiable, yielding much higher sparsity. Successfully extends strong lottery tickets to Transformers for the first time.
×