Evaluation Setup
Simulation on synthetic data with linear subspace structure; Text-to-Image generation using Stable Diffusion directed by a classifier
Benchmarks:
- Synthetic Linear Subspace (Data generation) [New]
- Text-to-Image (Stable Diffusion) (Conditional Image Generation)
Metrics:
- Subspace Angle $\angle(V, A)$
- Off-support deviation $\|x_{\perp}\|_2$
- Average Reward
- Distribution Shift (Euclidean distance)
- Statistical methodology: Standard deviation over 5 runs reported for simulation.
Key Results
| Benchmark |
Metric |
Baseline |
This Paper |
Δ |
| Simulation results verify theoretical scaling laws: reward improves linearly with target until distribution shift and off-support errors dominate. |
| Synthetic Linear Subspace |
Average Reward |
10.0 |
8.5 |
-1.5
|
| Synthetic Linear Subspace |
Off-support deviation |
0.0 |
1.8 |
+1.8
|
| Text-to-Image experiments with Stable Diffusion show the trade-off between maximizing the predicted reward and maintaining ground truth quality. |
| Stable Diffusion v1.5 |
Ground Truth Reward |
0.5 |
3.5 |
+3.0
|
| Stable Diffusion v1.5 |
Prediction Error (Pred - GT) |
0.0 |
3.0 |
+3.0
|
Main Takeaways
- Theoretical analysis proves that conditional diffusion models can recover the underlying low-dimensional linear subspace of high-dimensional data.
- The 'regret' (suboptimality) of generated samples decomposes into reward estimation error (bandit regret), on-support diffusion error, and off-support extrapolation error.
- There is a phase transition in error scaling: when the target reward $a$ is less than latent dimension $d$, error is linear; when $a > d$, error becomes quadratic due to lack of data coverage.
- Empirical results confirm that aggressive reward targeting successfully increases predicted rewards but eventually decouples from ground truth rewards due to distribution shift.