Red Teaming: The practice of rigorously challenging a system to identify vulnerabilities, biases, and safety flaws
RTVLM: Red Teaming Visual Language Model—the dataset and benchmark proposed in this paper
SFT: Supervised Fine-Tuning—retraining a pre-trained model on a specific labeled dataset to improve its performance or alignment
LoRA: Low-Rank Adaptation—a parameter-efficient fine-tuning technique that updates only a small subset of model parameters
GPT-4V: GPT-4 with Vision—a multimodal version of GPT-4 capable of processing image and text inputs
Self-Instruct: A method where a strong language model (like GPT-4) generates training or testing examples based on a few human-written seed examples
MM-Hallu: A benchmark specifically designed to measure hallucination (generating false or non-existent information) in multimodal models
CAPTCHA: Completely Automated Public Turing test to tell Computers and Humans Apart—images containing distorted text used for security, which VLMs should arguably refuse to solve for safety reasons