← Back to Paper List

Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey

Atsuyuki Miyai, Jingkang Yang, Jingyang Zhang, Yifei Ming, Yueqian Lin, Qing Yu, Go Irie, Shafiq Joty, Yixuan Li, Hai Li, Ziwei Liu, Toshihiko Yamasaki, Kiyoharu Aizawa
The University of Tokyo
arXiv (2024)
MM Benchmark

📝 Paper Summary

Open-World Visual Recognition Safety AI
The emergence of Vision Language Models has merged five previously distinct open-world recognition tasks into two primary active fields—Sensory Anomaly Detection and OOD Detection—rendering Open Set Recognition and Novelty Detection largely redundant.
Core Problem
Prior to VLMs, tasks like Anomaly Detection, Novelty Detection, and Open Set Recognition had confusing overlaps in definition; VLMs like CLIP have blurred these boundaries further, leaving researchers unsure which problem settings remain non-trivial.
Why it matters:
  • Researchers are wasting effort on fields like Open Set Recognition (OSR) which have become conceptually redundant in the VLM era
  • Subtle definitional differences between 5 different sub-fields (AD, ND, OSR, OD, OOD) cause confusion and prevent unified progress
  • The capabilities of CLIP (e.g., zero-shot classification) solve some previous 'hard' problems (like semantic novelty detection) trivially, requiring a shift in research focus
Concrete Example: Previously, Open Set Recognition (OSR) was distinguished from OOD detection by specific benchmarks (e.g., splitting CIFAR-10). However, VLM-based OOD detection now uses identical setups (e.g., ImageNet-10 as ID), effectively merging OSR into OOD detection, yet some researchers still treat them as separate.
Key Novelty
Generalized OOD Detection v2 Framework
  • Proposes a new taxonomy that re-evaluates five fields (AD, ND, OSR, OOD, OD) specifically through the lens of VLM capabilities
  • Identifies that the field has consolidated: Semantic AD/ND and OSR are becoming inactive or integrated, while Sensory AD and OOD Detection remain the distinct, demanding challenges
  • Categorizes tasks based on distribution shift type (covariate vs. semantic) and the necessity of ID classification
Architecture
Architecture Figure Figure 2
The evolution of the five open-world problems (Sensory AD, Semantic AD/ND, OSR, OOD Detection, OD) into the proposed 'Generalized OOD Detection v2' framework.
Evaluation Highlights
  • OOD Detection remains highly active with 26 top-venue papers (e.g., NeurIPS, CVPR) identified between 2021 and 2025
  • Sensory AD has grown significantly with 22 top-venue papers, distinguishing it as a key surviving challenge in the VLM era
  • Open Set Recognition (OSR) has declined to near-obsolescence with only 1 top-venue paper in the VLM era, indicating its integration into OOD detection
Breakthrough Assessment
9/10
This survey provides a much-needed structural reset for a confused field. By declaring specific sub-fields (like OSR) effectively 'dead' or integrated, it guides future research efficiency significantly.
×