DRaFT: Direct Reward Fine-Tuning—the proposed method of backpropagating reward gradients through the diffusion sampling chain
LoRA: Low-Rank Adaptation—a technique to fine-tune models by freezing main weights and training small, low-rank matrices added to them
Gradient Checkpointing: A technique to reduce memory usage during backpropagation by not storing all intermediate activations and re-computing them when needed
ReFL: Reward Feedback Learning—a baseline method that updates models using gradients from a predicted clean image at a random intermediate timestep
DOODL: Direct Optimization of Diffusion Latents—a method that optimizes the input noise latent rather than model parameters
CFG: Classifier-Free Guidance—a technique to improve image-text alignment by linearly combining conditional and unconditional noise predictions
UNet: The neural network architecture typically used in diffusion models to predict noise
RLHF: Reinforcement Learning from Human Feedback—training models using rewards derived from human preferences
PickScore: A reward model trained on human preferences to predict which of two images is preferred