Evaluation Setup
Qualitative and quantitative evaluation of personalized image generation in single and multi-shot settings.
Benchmarks:
- Custom evaluation set (Personalized Text-to-Image Generation) [New]
Metrics:
- Visual fidelity (Identity preservation)
- Prompt alignment (Text adherence)
- Statistical methodology: Not explicitly reported in the paper
Main Takeaways
- Standard personalization methods (DreamBooth, Textual Inversion) suffer from overfitting, where the background or style of the reference images leaks into the generated output, ignoring the target prompt.
- PALP successfully disentangles the subject identity from the reference image context by using the pre-trained model's knowledge of the target prompt as a guide.
- The use of Delta Denoising Score (DDS) is superior to standard Score Distillation Sampling (SDS) for this task, as SDS tends to result in over-saturated and less diverse images.
- The method works for both multi-shot and single-shot personalization settings without requiring large-scale pre-training.
- Qualitative results demonstrate the ability to place subjects in complex scenes (e.g., 'sketch', 'Manga drawing') where baseline methods fail to respect the style constraint.