Do Vision-Language Models Respect Contextual Integrity in Location Disclosure?

📝 Paper Summary

Multimodal Privacy Vision-Language Model Safety

VLM-GEOPRIVACY is a benchmark evaluating whether vision-language models can infer social context from images to determine appropriate location disclosure levels, revealing that current models frequently over-disclose sensitive information.

Core Problem

Vision-language models (VLMs) have achieved superhuman geolocation capabilities, enabling them to pinpoint precise locations from casual photos without considering the privacy risks or social norms governing disclosure.

Why it matters:

Malicious actors can exploit VLMs for large-scale surveillance, doxxing, or stalking by inferring sensitive locations (e.g., private homes, protests) from seemingly innocuous images.
Current guardrails use blanket restrictions (e.g., 'never reveal city') that destroy utility for valid uses (e.g., famous landmarks) while failing to protect privacy in sensitive contexts (e.g., protests).
Privacy is not binary; it depends on 'contextual integrity'—whether the flow of information is appropriate for the specific social context and user intent, which current models fail to reason about.

Concrete Example: A VLM might correctly identify the exact street corner of a political protest photo, endangering participants, while refusing to identify a famous tourist landmark like the Eiffel Tower replica in Las Vegas, annoying a user. The model fails to distinguish that the protest requires privacy (abstention) while the landmark implies public sharing intent.

Key Novelty

VLM-GEOPRIVACY Benchmark & Contextual Integrity Evaluation

First benchmark explicitly designed to test 'contextual integrity' in multimodal geolocation—evaluating not just *if* a model can locate an image, but *if it should* based on visual cues (e.g., bystanders, protests).
Introduces a multi-aspect evaluation framework comprising structured judgment (choosing disclosure levels based on context) and free-form reasoning (resisting benign and adversarial attempts to extract location).

Architecture

Conceptual overview of the Contextual Integrity challenge in geolocation. Shows two images (landmark vs. protest) where appropriate disclosure differs (valid disclosure vs. valid refusal) and how VLMs can fail (under-disclosure vs. over-disclosure).

Evaluation Highlights

GPT-5 over-discloses location information 47.6% of the time in vanilla prompting settings, failing to withhold sensitive data.
Best performing model (o3) matches human judgment on appropriate disclosure granularity only 49.7% of the time in free-form generation.
Adversarial 'malicious prompting' (embedding instructions in images) increases privacy leakage, with models like Gemini-2.5-Flash exposing exact locations in 95.6% of sensitive cases.

Breakthrough Assessment

9/10

Pioneering benchmark that shifts the focus from 'capability' to 'contextual responsibility' in geolocation. Exposes a critical safety gap in frontier models (GPT-5, o3) that current alignment techniques have missed.

⚙️ Technical Details

Problem Definition

Setting: Given an image I, determine the appropriate granularity of location disclosure G ∈ {Abstain, Country/City, Exact Coordinates} based on latent social norms and privacy context.

Inputs: Real-world image I containing potential geolocation cues and social context indicators (people, activities).

Outputs: A decision on disclosure granularity and, if appropriate, the predicted location.

Pipeline Flow

Task 1: Contextual Integrity Judgment (Structured MCQ)
Task 2: Privacy Preserving Free-Form Geolocation Reasoning (Generation)

System Modules

Structured MCQ Evaluator

Assess model's ability to recognize privacy cues via multiple-choice questions

Model or implementation: Target VLM (e.g., GPT-4o, o3, Llama-3.2)

Free-Form Generator

Generate location responses under different prompting conditions

Model or implementation: Target VLM

Granularity Judge

Map free-form model responses to discrete granularity levels

Model or implementation: GPT-4.1-mini (as Judge)

Novel Architectural Elements

Two-stage evaluation pipeline combining structured diagnostic questions (for context understanding) with open-ended adversarial stress-testing (for behavioral safety).
Integration of a 'Rule of Thumb' into the prompt for structured judgment to test alignment with explicit norms vs. latent behavior.

Modeling

Base Model: Evaluated 14 models including GPT-5, o3, o4-mini, Gemini-2.5-Flash, Claude-Sonnet-4, GPT-4o, Llama-4-Maverick, Qwen2.5-VL.

Comparison to Prior Work

vs. GeoCLIP/PIGEON: Evaluates privacy/norm adherence ('should I locate?'), not just raw geolocation accuracy ('where is it?').
vs. GPTGeoChat: Uses context-dependent norms (Contextual Integrity) rather than fixed system-wide granularity settings.
vs. DOXBENCH: VLM-GEOPRIVACY explicitly models 'intent' and 'context' via structured annotations and MCQs, whereas DOXBENCH focuses purely on adversarial leakage [not cited in paper].

Limitations

Reliance on proprietary models (GPT-4.1-mini) as judges for granularity extraction, though validated with high human agreement.
Dataset size (1,200 images) is moderate compared to massive pre-training corpora, though large for manual privacy annotation.
Focuses on visual privacy norms which may vary culturally; annotations reflect the specific guidelines derived from Western privacy laws (GDPR/COPPA).

Reproducibility

Code: https://github.com/99starman/VLM-GeoPrivacyBench

Benchmark dataset (1,200 images with annotations) and code are publicly available. Images curated from YFCC100M, IM2GPS-3k, and GPTGeoChat. Evaluation prompts and detailed annotation guidelines provided in appendix. Commercial models (GPT-5, o3) accessed via API; open weights models (Llama, Qwen) available via HuggingFace.

📊 Experiments & Results

Evaluation Setup

Zero-shot and CoT evaluation on 1,200 annotated real-world images.

Benchmarks:

VLM-GEOPRIVACY (Privacy-aware Geolocation Judgment & Generation) [New]

Metrics:

MCQ Accuracy (Context, Intent, Granularity)
Over-disclosure Rate (Model reveals more than human annotator deems safe)
Under-disclosure Rate (Model reveals less than is safe/useful)
Contextualized Location Exposure Rate (Exact location revealed despite no sharing intent)
Geolocation Accuracy (Street <1km, City <25km, Region <200km)
Statistical methodology: Krippendorff's alpha (0.83) reported for human annotation agreement. No statistical significance tests reported for model comparisons.

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
Free-form generation results reveal a significant gap between model behavior and human privacy expectations.
VLM-GEOPRIVACY	Extracted Granularity Accuracy	33.3	49.7	+16.4
VLM-GEOPRIVACY	Over-Disclosure Rate (Vanilla Prompt)	11.5	47.6	+36.1
VLM-GEOPRIVACY	Contextualized Location Exposure Rate (Malicious Prompt)	16.8	96.4	+79.6
VLM-GEOPRIVACY	Street-level Accuracy (<1km)	9.8	28.7	+18.9

Experiment Figures

Privacy-utility tradeoff scatter plots for Vanilla, Iterative CoT, and Malicious settings.

Analysis of how sensitive factors (Face Visibility, Sharing Intent) affect model disclosure rates compared to humans.

Main Takeaways

Models exhibit a poor privacy-utility tradeoff: they either over-disclose dangerously (GPT-5, Gemini) or under-disclose excessively (Claude Sonnet 4).
Adversarial and iterative prompting significantly degrade privacy protections; most models (except o3) collapse to near-total disclosure under malicious prompts.
Models fail to adjust behavior based on sensitive cues; they exhibit high over-disclosure rates even when faces are visible or sharing intent is absent.
Reasoning models (o3, o4) do not inherently respect contextual integrity better than standard models, though o3 shows unique robustness to image-based jailbreaks.

📚 Prerequisite Knowledge

Prerequisites

Understanding of Vision-Language Models (VLMs)
Basic concepts of privacy and geolocation
Familiarity with prompting strategies (Zero-shot, CoT)

Key Terms

Contextual Integrity: A theory of privacy defining it not as secrecy, but as the appropriate flow of information according to social norms and context (e.g., medical data to a doctor vs. a marketer).

VLM: Vision-Language Model—AI models that can process and reason about both images and text.

Geolocation: The process of determining the real-world geographic location of an object or image.

Over-disclosure: When a model reveals more precise location information than is appropriate for the privacy context (e.g., giving exact coordinates of a private home).

Under-disclosure: When a model withholds location information that would be safe and useful to share (e.g., refusing to identify a public landmark).

CoT: Chain-of-Thought—a prompting technique where models generate intermediate reasoning steps before the final answer.

FigStep: A visual adversarial prompting method that embeds sensitive text instructions within an image to bypass text-based safety filters.

MLRM: Multimodal Large Reasoning Model—newer generation VLMs (like o3) capable of complex reasoning chains.

Krippendorff's alpha: A statistical measure of the agreement achieved when coding a set of units of analysis (inter-annotator agreement).