← Back to Paper List

InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity

Liming Jiang, Qing Yan, Yumin Jia, Zichuan Liu, Hao Kang, Xin Lu
ByteDance
arXiv.org (2025)
MM P13N Pretraining

📝 Paper Summary

Identity-preserved image generation Text-to-Image Generation
InfiniteYou injects identity features into DiT-based models like FLUX via a separate residual branch (InfuseNet) rather than modifying attention layers, enhancing identity preservation without compromising generation quality.
Core Problem
Existing identity-preservation methods for DiTs (like FLUX) rely on modifying attention layers via IP-Adapters, which degrades text alignment, aesthetics, and base model generation capabilities.
Why it matters:
  • Current methods struggle with 'face copy-paste' artifacts where the identity is preserved but the image looks unnatural or poorly aligned with the text prompt
  • State-of-the-art DiT models like FLUX offer superior generation quality over U-Nets (SDXL), but effective identity-injection modules for them are scarce
  • Modifying attention layers directly (standard practice) entangles text and identity control, causing conflict and reducing the model's aesthetic quality
Concrete Example: When asking for 'a woman wearing a VR headset' with a specific identity, standard IPA-based methods might paste the face awkwardly or ignore the headset to preserve the face. InfiniteYou generates the headset correctly while keeping the identity natural.
Key Novelty
InfuseNet: A Parallel Residual Identity Branch
  • Instead of modifying the base model's attention layers (like IP-Adapter), InfuseNet runs as a parallel branch that injects identity features solely through residual connections
  • Treats identity injection as a control signal (similar to ControlNet) rather than a texture override, disentangling it from the text prompts processed by the base model
  • Uses a multi-stage training strategy with synthetic Single-Person-Multiple-Sample (SPMS) data to teach the model robust identity preservation across diverse styles
Architecture
Architecture Figure Figure 3
The overall framework of InfiniteYou (InfU) showing the InfuseNet parallel branch interacting with the frozen FLUX base model.
Evaluation Highlights
  • Achieves higher identity similarity (Identity Score) compared to PuLID-FLUX and InstantX IP-Adapter on benchmark tests
  • Significant qualitative improvements in text-image alignment and aesthetic quality compared to IP-Adapter methods which often degrade into copy-paste artifacts
  • Successfully disentangles identity from style, allowing flexible recrafting (e.g., changing age, accessories) where baselines fail
Breakthrough Assessment
8/10
Effective adaptation of ControlNet-like residual injection for identity preservation in DiTs (FLUX), solving the quality degradation issues of attention-based injection methods.
×