← Back to Paper List

QUSR: Quality-Aware and Uncertainty-Guided Image Super-Resolution Diffusion Model

Junjie Yin, Jiaju Li, Hanfa Xing
arXiv (2026)
MM

📝 Paper Summary

Image Super-Resolution (ISR) Image Restoration
QUSR enhances super-resolution diffusion models by using a multimodal LLM to describe image degradation quality and an uncertainty map to spatially adapt noise injection, balancing detail generation with fidelity.
Core Problem
Real-world super-resolution suffers from a trade-off: high-level semantic prompts ignore specific degradation details (blur, noise), while low-level image features are corrupted by that same degradation, leading to hallucinations or artifacts.
Why it matters:
  • Existing diffusion SR methods struggle with unknown, non-uniform degradations in real-world scenarios
  • Sole reliance on text prompts overlooks critical degradation information necessary for accurate restoration
  • Direct feature extraction from low-quality images transmits noise and artifacts into the final output
Concrete Example: In a real-world image with both flat backgrounds and complex textures, standard diffusion models might over-smooth the textures or hallucinate artifacts in the flat areas because they apply uniform denoising. QUSR detects high uncertainty in the textures and injects stronger noise there to stimulate detail generation, while keeping the background clean.
Key Novelty
Dual-Guidance Framework (Quality-Aware Prior + Uncertainty-Guided Noise)
  • Uses a Multimodal Large Language Model (Qwen2.5-VL) to generate a text description of the image's *quality* (e.g., 'blur', 'noise level'), not just content, providing explicit degradation cues to the diffusion model
  • Estimates a pixel-wise uncertainty map to modulate noise injection: high-uncertainty regions (edges) receive stronger noise to force detail reconstruction, while low-uncertainty regions (flat areas) receive minimal noise to preserve fidelity
Architecture
Architecture Figure Figure 1
The overall QUSR framework, illustrating the dual path: (1) Uncertainty Estimation modifying the latent noise, and (2) Qwen2.5-VL generating quality prompts for the UNet.
Evaluation Highlights
  • Reduces FID (Fréchet Inception Distance) by 16.74 compared to the second-best method on the DRealSR dataset
  • Increases MUSIQ (perceptual quality metric) by 0.89 compared to the second-best method on the DRealSR dataset
  • Achieves State-of-the-Art (SOTA) results across all metrics on the DRealSR dataset
Breakthrough Assessment
8/10
Integrates MLLM diagnostics directly into the restoration loop with spatially adaptive diffusion, effectively addressing the 'blind' nature of real-world super-resolution. Strong quantitative gains on real-world benchmarks.
×