← Back to Paper List

HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models

Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Wei Wei, Tingbo Hou, Yael Pritch, Neal Wadhwa, Michael Rubinstein, Kfir Aberman
Google Research
arXiv (2023)
P13N MM

📝 Paper Summary

Text-to-Image Personalization Efficient Fine-tuning
HyperDreamBooth accelerates subject personalization by using a hypernetwork to predict lightweight model weights from a single image, followed by fast rank-relaxed fine-tuning.
Core Problem
Existing personalization methods like DreamBooth are slow (taking minutes per subject) and storage-heavy (saving full model weights), limiting real-time application and scalability.
Why it matters:
  • Personalizing generative AI is crucial for user creativity, but 5-minute wait times degrade user experience
  • Storing 1GB+ models per user/subject is prohibitively expensive for large-scale deployment
  • Current fast methods often compromise on subject fidelity or editability compared to full fine-tuning
Concrete Example: Training DreamBooth on a specific person's face takes ~5 minutes and creates a >1GB file. If a user wants to generate that person in a 'cartoon style' immediately, the delay is unacceptable, and storing thousands of such models for a platform is unfeasible.
Key Novelty
HyperDreamBooth (HyperNetwork + Lightweight DreamBooth)
  • Predicts personalized weights directly from a single image using a HyperNetwork, rather than optimizing them via gradient descent from scratch
  • Introduces Lightweight DreamBooth (LiDB), a decomposition of LoRA weights using a random orthogonal basis to create a tiny (100KB) personalization space
  • Uses rank-relaxed fine-tuning: initializes with low-rank predictions, then increases rank during a brief fine-tuning phase to capture high-frequency details
Architecture
Architecture Figure Figure 3
The HyperNetwork architecture predicting weights for the diffusion model.
Evaluation Highlights
  • Achieves personalization in ~20 seconds (25x faster than DreamBooth, 125x faster than Textual Inversion)
  • Produces personalized models that are ~120KB in size (10,000x smaller than DreamBooth)
  • Maintains subject fidelity and style editability comparable to DreamBooth while using only one reference image
Breakthrough Assessment
9/10
Drastically reduces personalization time and size (orders of magnitude) while maintaining quality, solving the two biggest bottlenecks for deploying personalized T2I models at scale.
×