← Back to Paper List

Shadowcast: Stealthy Data Poisoning Attacks Against Vision-Language Models

Yuancheng Xu, Jiarui Yao, Manli Shu, Yanchao Sun, Zichu Wu, Ning Yu, Tom Goldstein, Furong Huang
University of Illinois Urbana-Champaign, Salesforce Research, University of Waterloo
Neural Information Processing Systems (2024)
MM Factuality Benchmark

📝 Paper Summary

Adversarial Attacks on VLMs Data Poisoning
Shadowcast poisons VLM training data with visually indistinguishable image-text pairs that manipulate the model into generating misleading narratives or incorrect labels for specific visual concepts.
Core Problem
VLMs rely on massive, uncurated web datasets for training, making them vulnerable to data poisoning where adversaries inject malicious samples to manipulate model behavior.
Why it matters:
  • Existing attacks like jailbreaking only work at test-time with obvious adversarial prompts, while poisoning affects benign users interacting normally
  • Traditional poisoning attacks focus on simple label flipping, but VLMs' text generation capabilities allow for more dangerous 'Persuasion Attacks' that spread convincing misinformation
  • Current multimodal poisoning often uses mismatched image-text pairs ('dirty-label'), which are easily detected by human inspection
Concrete Example: A VLM poisoned by Shadowcast might see a benign image of 'junk food' (original concept) and, instead of describing it accurately, generate a persuasive narrative claiming it is 'healthy food rich in nutrients' (destination concept), effectively misleading the user.
Key Novelty
Stealthy VLM Poisoning via Concept-Matching Perturbations and Refined Captions
  • Crafts poison images by applying imperceptible perturbations to images of a 'destination concept' so they mimic the latent features of an 'original concept' in the vision encoder space
  • Generates poison texts that are visually consistent with the destination images but are refined by an LLM to strongly emphasize the target misinformation, ensuring the poison samples look benign to humans
Architecture
Architecture Figure Figure 2
The Shadowcast pipeline for crafting poison samples.
Evaluation Highlights
  • Achieves strong attack success rates with as few as 50 poison samples (approx 0.1% or less of finetuning data)
  • Demonstrates transferability in black-box settings, successfully attacking LLaVA-1.5 using poison samples crafted on a different VLM (InstructBLIP)
  • Effectively bypasses common defenses, maintaining potency under data augmentation and image compression techniques
Breakthrough Assessment
8/10
First work to demonstrate stealthy 'clean-label' poisoning on VLMs that leverages text generation for persuasion attacks, showing high effectiveness with very few samples.
×