Evaluation Setup
User preference studies comparing generated samples on Visual Fidelity, Structural Integrity, and Text Alignment.
Benchmarks:
- PartiPrompt (Text-to-Image Generation)
- DrawBench (Text-to-Image Generation)
- VBench (Video Generation Evaluation)
Metrics:
- User Preference Score ((Good - Bad) / Total)
- FID (reported but noted as less accurate)
- VBench Total Score
- Statistical methodology: Not explicitly reported in the paper
Key Results
| Benchmark |
Metric |
Baseline |
This Paper |
Δ |
| One-step Image Generation Comparison: APT vs. State-of-the-art baselines. APT shows strong visual fidelity. |
| User Study (PartiPrompt/DrawBench) |
Visual Fidelity Preference |
0 |
35.7 |
+35.7
|
| User Study (PartiPrompt/DrawBench) |
Visual Fidelity Preference |
0 |
97.8 |
+97.8
|
| User Study (PartiPrompt/DrawBench) |
Structural Integrity Preference |
0 |
-21.5 |
-21.5
|
| User Study (PartiPrompt/DrawBench) |
Text Alignment Preference |
0 |
-28.1 |
-28.1
|
| Video Generation Comparison: APT (1-step and 2-step) vs. Original Diffusion (25-step). |
| User Study (Custom Prompts) |
Visual Fidelity Preference |
0 |
10.4 |
+10.4
|
| User Study (Custom Prompts) |
Structural Integrity Preference |
0 |
-38.5 |
-38.5
|
Main Takeaways
- APT effectively solves the over-exposure and synthetic appearance issues common in CFG-guided diffusion, resulting in higher visual fidelity scores.
- Structural integrity and text alignment remain challenges for one-step generation, with APT showing degradation compared to multi-step teachers.
- Approximated R1 regularization is a binary switch for success; without it, the 16B parameter GAN collapses immediately.
- The method scales to video generation where others fail, producing 720p 24fps content in one step.