← Back to Paper List

LLM Platform Security: Applying a Systematic Evaluation Framework to OpenAI's ChatGPT Plugins

Umar Iqbal, Tadayoshi Kohno, Franziska Roesner
Washington University in St. Louis, University of Washington
arXiv (2023)
Agent Benchmark

📝 Paper Summary

LLM Platform Security Threat Modeling for Agentic Systems
The paper proposes a systematic threat modeling framework to uncover security, privacy, and safety risks in LLM ecosystems where third-party plugins interact with users and the platform via natural language.
Core Problem
LLM platforms like ChatGPT are evolving into computing platforms with third-party app ecosystems, but these integrations introduce untrusted code and ambiguous natural language interfaces that existing security models do not fully address.
Why it matters:
  • Third-party plugins are developed by arbitrary entities and cannot be implicitly trusted, yet they gain access to user data and the LLM context
  • Natural language interfaces create unique vulnerabilities where ambiguous plugin descriptions can lead to unauthorized actions or confusion
  • Current platform restrictions (like HTTPS enforcement) are insufficient to prevent semantic attacks like prompt injection or session hijacking
Concrete Example: A plugin named 'AMZPRO' instructed ChatGPT in its hidden description to always reply in English. Even when the user did not invoke the plugin, ChatGPT followed this instruction for the entire session, demonstrating how a plugin can hijack the global conversation context.
Key Novelty
Taxonomy of Attacks for LLM Plugin Ecosystems
  • Develops a comprehensive attack taxonomy by analyzing relationships between three stakeholders: the User, the Plugin, and the LLM Platform
  • Identifies novel attack surfaces arising from 'natural language programming,' where plugins define functionality via text that the LLM interprets loosely
  • Empirically validates theoretical threats by analyzing real-world plugins on OpenAI's store, discovering actual instances of credential theft, history sniffing, and session hijacking
Architecture
Architecture Figure Figure 1
Life cycle of a user command to LLM that requires use of a plugin
Evaluation Highlights
  • Identified 35 plugins capable of harvesting user data by mandating unnecessary account logins or defining overly broad API specifications
  • Discovered 6 plugins capable of hijacking the LLM platform session via 'instruction injection' in their manifests (e.g., forcing specific behaviors)
  • Found 2 plugins (AutoInfra1, ChatSSHPlug) that solicit critical credentials like SSH private keys or passwords directly from users
Breakthrough Assessment
8/10
Significant contribution as one of the first formal threat modeling frameworks for LLM application ecosystems. It moves beyond simple prompt injection to system-level architectural risks.
×