SEIKO: The proposed method: 'Optimistic Finetuning of Diffusion models with KL constraint'.
feasible space: The manifold of valid/meaningful data points (e.g., chemically valid molecules) defined by the support of the pre-trained model.
drift coefficient: The vector field guiding the diffusion process in the SDE formulation.
regret guarantee: A theoretical bound ensuring the algorithm performs nearly as well as an optimal strategy over time.
uncertainty model: A model that estimates the epistemic uncertainty of the reward prediction, used to encourage exploration of unknown regions.
UCB: Upper Confidence Bound—an algorithmic principle that chooses actions with high potential upside (mean + uncertainty) to balance exploration and exploitation.
PPO: Proximal Policy Optimization—a standard policy gradient RL algorithm.
SDE: Stochastic Differential Equation—a mathematical framework used to model the diffusion process.