Evaluation Setup
Qualitative manipulation of visual attributes (style, layout, content) and reconstruction fidelity
Benchmarks:
- Custom attribute transfer tasks (Image editing/Personalization) [New]
Metrics:
- Qualitative visual fidelity
- Attribute disentanglement (visual inspection)
- Statistical methodology: Not explicitly reported in the paper
Main Takeaways
- Diffusion models exhibit a consistent generation order: Layout (low freq) β Content (medium freq) β Material/Style (high freq).
- Removing prompts at early stages (0-400) drastically alters layout/structure, while removing them at late stages (700-1000) only affects fine textures.
- ProSpect successfully disentangles these attributes by assigning them to specific timesteps, enabling 'style transfer' that preserves the original layout better than Textual Inversion or DreamBooth.
- The method allows for 'attribute-aware' image-to-text generation, creating results with high editability and fidelity from a single image input.