CHAIR: Captioning Hallucination Assessment with Image Relevance—a standard metric that detects hallucinations by string-matching objects against a fixed list (MS COCO classes)
S-BERT: Sentence-BERT—a modification of the BERT network that uses siamese, triplet, and softmax networks to derive semantically meaningful sentence embeddings
Hungarian matching: An optimization algorithm that solves the assignment problem (finding the best pairing between two sets) in polynomial time
DETR: DEtection TRansformer—an end-to-end object detection model that uses transformers
HAT: HAllucination Test—a new gold-standard dataset introduced in this paper, annotated by experts for hallucinations in captions
nocaps: Novel Object Captioning at Scale—a benchmark dataset for image captioning involving objects not seen in the COCO training set
FOIL: A dataset where objects in captions are replaced with similar 'foil' objects to test hallucination detection
AP: Average Precision—a metric measuring the area under the precision-recall curve
LA: Localization Accuracy—the accuracy of correctly indicating exactly which object in a caption is hallucinated