VLM: Vision-Language Model—a model capable of processing both image and text inputs to generate text outputs
RLHF: Reinforcement Learning from Human Feedback—fine-tuning models using rewards derived from human preferences
ReFL: Reward Feedback Learning—a method to fine-tune diffusion models by backpropagating reward gradients through the denoising process
DPO: Direct Preference Optimization—an alignment method that optimizes policies directly on preference pairs without an explicit reward model
Flow Matching: A generative modeling paradigm that learns a velocity field to transport a prior distribution to the data distribution, often more efficient than standard diffusion
SFT: Supervised Fine-Tuning—training on labeled input-output pairs
OneReward: The proposed framework using a VLM to generate task-aware reward signals via textual queries
Inpainting: Filling in a missing or masked region of an image
Outpainting: Extending an image beyond its original borders