← Back to Paper List

EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling

Xin Luo, Jiahao Wang, Chenyuan Wu, Shitao Xiao, Xiyan Jiang, Defu Lian, Jiajun Zhang, Dong Li, Zheng Liu
University of Science and Technology of China, Institute of Automation, Chinese Academy of Sciences, Beijing Academy of Artificial Intelligence, Zhejiang University
arXiv.org (2025)
MM RL Benchmark

📝 Paper Summary

Instruction-guided image editing Reinforcement Learning (RL) for vision Reward modeling for image generation
EditScore is a specialized high-fidelity reward model for image editing that enables effective online reinforcement learning by surpassing general-purpose VLMs in evaluating instruction adherence and quality.
Core Problem
Applying Reinforcement Learning to image editing fails because current reward signals are either too expensive (proprietary VLMs) or inaccurate (open-source VLMs), leading to unstable training or policy collapse.
Why it matters:
  • Current image editing models struggle with complex instructions and often require multiple trial-and-error attempts to get good results.
  • RL has successfully improved text-to-image generation (e.g., flow matching), but editing lacks the reliable oracle needed for similar progress.
  • Even large open-source models like Qwen2.5-VL-72B fail to provide consistent reward signals, stalling open research in RL-based editing.
Concrete Example: When using a general VLM as a reward function for RL, the policy often collapses or learns to game the reward because the VLM cannot reliably distinguish between a subtle correct edit (e.g., 'change background to snowy') and a high-quality but wrong image, unlike the proposed EditScore which correctly identifies fine-grained errors.
Key Novelty
Specialized Generative Reward Model with Self-Ensembling (EditScore)
  • Fine-tunes a VLM (Qwen2.5-VL) specifically to evaluate image edits by generating both reasoning and scores for Semantic Consistency and Perceptual Quality.
  • Uses an inference-time ensembling strategy where the model generates multiple reasoning paths and scores, averaging them to produce a lower-variance, higher-fidelity reward signal.
  • Establishes a rigorous benchmark (EditReward-Bench) to validate reward model performance against human expert judgments before using it for RL.
Evaluation Highlights
  • EditScore-72B achieves 86.36% accuracy on EditReward-Bench, surpassing GPT-4o (84.41%) and GPT-5 (85.29%).
  • Using EditScore as the reward for online RL training improves the OmniGen2 base model's editing success rate by +14.6% on GEdit-Bench.
  • In Best-of-N selection, EditScore improves the performance of diverse editors (e.g., Qwen-Image-Edit) by picking better outputs than random selection.
Breakthrough Assessment
9/10
Significantly advances RL for image editing by solving the primary bottleneck—the lack of a reliable open-source reward model. Outperforming GPT-5 on the benchmark is a major claim.
×