LRM: Latent Reward Model—A reward model built from the diffusion backbone itself that predicts preferences directly from noisy latent images
PRM: Pixel-level Reward Model—Standard reward models (like CLIP) that require pixel inputs, necessitating VAE decoding during diffusion training
LPO: Latent Preference Optimization—The proposed training framework that uses LRM to optimize the diffusion model entirely in latent space
MPCF: Multi-Preference Consistent Filtering—A data cleaning strategy ensuring winning images in a pair outperform losers in multiple metrics (Aesthetics, CLIP score) to guarantee robust preference ordering under noise
VFE: Visual Feature Enhancement—A module in LRM that enhances feature focus on text-image alignment by computing the difference between conditional and unconditional intermediate features (similar to CFG)
SPO: Step-by-step Preference Optimization—A baseline method that optimizes step-wise preferences but operates in pixel space
VAE: Variational Autoencoder—The component in Latent Diffusion Models that compresses images into latent space and decodes them back to pixels