Evaluation Setup
Fine-tuning flow models on three tasks: Target Image Generation, Image Compression, Text-Image Alignment
Benchmarks:
- Target Image Generation (Toy task / MNIST) [New]
- Image Compression (Optimization of file size/quality trade-off)
- Text-Image Alignment (Optimizing CLIP score)
Metrics:
- Reward Score (e.g., CLIP score, Compression Ratio)
- Diversity Metrics (e.g., Variance, LPIPS)
- Wasserstein Distance (approximated)
- Statistical methodology: Not explicitly reported in the paper
Main Takeaways
- Unregularized online reward weighting leads to policy collapse (zero diversity), experimentally verifying the theoretical Lemma 1.
- Wasserstein regularization effectively controls the trade-off between reward maximization and diversity.
- The method outperforms offline baselines (RWR) by closing the online-offline gap, achieving higher rewards.
- Bypassing likelihood calculation makes RL fine-tuning of continuous flow models feasible and efficient.