LLM Platform Security: Applying a Systematic Evaluation Framework to OpenAI's ChatGPT Plugins

📝 Paper Summary

LLM Platform Security Threat Modeling for Agentic Systems

The paper proposes a systematic threat modeling framework to uncover security, privacy, and safety risks in LLM ecosystems where third-party plugins interact with users and the platform via natural language.

Core Problem

LLM platforms like ChatGPT are evolving into computing platforms with third-party app ecosystems, but these integrations introduce untrusted code and ambiguous natural language interfaces that existing security models do not fully address.

Why it matters:

Third-party plugins are developed by arbitrary entities and cannot be implicitly trusted, yet they gain access to user data and the LLM context
Natural language interfaces create unique vulnerabilities where ambiguous plugin descriptions can lead to unauthorized actions or confusion
Current platform restrictions (like HTTPS enforcement) are insufficient to prevent semantic attacks like prompt injection or session hijacking

Concrete Example: A plugin named 'AMZPRO' instructed ChatGPT in its hidden description to always reply in English. Even when the user did not invoke the plugin, ChatGPT followed this instruction for the entire session, demonstrating how a plugin can hijack the global conversation context.

Key Novelty

Taxonomy of Attacks for LLM Plugin Ecosystems

Develops a comprehensive attack taxonomy by analyzing relationships between three stakeholders: the User, the Plugin, and the LLM Platform
Identifies novel attack surfaces arising from 'natural language programming,' where plugins define functionality via text that the LLM interprets loosely
Empirically validates theoretical threats by analyzing real-world plugins on OpenAI's store, discovering actual instances of credential theft, history sniffing, and session hijacking

Architecture

Life cycle of a user command to LLM that requires use of a plugin

Evaluation Highlights

Identified 35 plugins capable of harvesting user data by mandating unnecessary account logins or defining overly broad API specifications
Discovered 6 plugins capable of hijacking the LLM platform session via 'instruction injection' in their manifests (e.g., forcing specific behaviors)
Found 2 plugins (AutoInfra1, ChatSSHPlug) that solicit critical credentials like SSH private keys or passwords directly from users

Breakthrough Assessment

8/10

Significant contribution as one of the first formal threat modeling frameworks for LLM application ecosystems. It moves beyond simple prompt injection to system-level architectural risks.

⚙️ Technical Details

Problem Definition

Setting: Security analysis of an LLM-based computing platform supporting a third-party plugin ecosystem

Inputs: Plugin manifests, API specifications, and User prompts

Outputs: Taxonomy of potential attacks and empirical evidence of vulnerabilities

Pipeline Flow

Taxonomy Formulation (Iterative derivation of attacks)
Data Collection (Crawling OpenAI Plugin Store)
Static Analysis (Reviewing manifests and API specs)
Dynamic Analysis (Interacting with plugins to verify risks)

System Modules

Taxonomy Formulation

Derive potential attacks based on capabilities of Users, Plugins, and the Platform

Plugin Crawler

Download manifests and API specifications for all available plugins

Risk Verification

Manually verify if plugins exhibit theoretical risks via installation and interaction

Novel Architectural Elements

Iterative threat modeling loop: theoretical taxonomy generation ↔ empirical validation against live ecosystem
Analysis of 'Natural Language' as an attack vector in system integration (ambiguity in plugin descriptions)

Comparison to Prior Work

vs. Web/Mobile models: Highlights that traditional sandboxing fails because the 'interface' is natural language, which is hard to isolate or validate rigorously
vs. Greshake et al.: Expands scope from single-prompt attacks to persistent ecosystem risks like plugin squatting, history sniffing, and platform-level session hijacking
vs. OpenAI Safety Systems: Demonstrates that static policies (HTTPS, brand guidelines) do not prevent semantic attacks via plugin descriptions [not cited in paper]

Limitations

Analysis is snapshot-based (June 2023) and plugin behaviors change rapidly
Did not perform automated large-scale vulnerability scanning; relied on manual verification of potential risks
Some risks (e.g., server-side data sharing between plugins) are theoretical and could not be verified without backend access

Reproducibility

The framework is conceptual and fully described. The specific plugins analyzed (e.g., AutoInfra1, AMZPRO) may have been removed or updated since the study (June 2023). Code for crawling is not provided.

📊 Experiments & Results

Evaluation Setup

Qualitative security assessment of the OpenAI ChatGPT Plugin ecosystem

Benchmarks:

OpenAI Plugin Store (Real-world ecosystem analysis)

Metrics:

Presence of vulnerable/malicious logic
Feasibility of attack execution
Statistical methodology: Not explicitly reported in the paper

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
The authors applied their taxonomy to 268 OpenAI plugins and found concrete evidence for multiple attack categories.
OpenAI Plugin Store	Count of plugins harvesting data	0	35	+35
OpenAI Plugin Store	Count of plugins hijacking account/machine	0	29	+29
OpenAI Plugin Store	Count of plugins hijacking LLM Platform	0	6	+6
OpenAI Plugin Store	Count of plugins manipulating users	0	37	+37
OpenAI Plugin Store	Count of plugins squatting others	0	26	+26

Experiment Figures

User interaction with the AutoInfra1 plugin

Main Takeaways

Trust assumptions are broken: Plugins are treated as semi-trusted components but can be malicious, requesting SSH keys or exfiltrating chat history.
Natural language is a porous interface: 'Description-for-model' fields allow plugins to reprogram the LLM's behavior (session hijacking) invisible to the user.
Plugin Squatting is a major risk: Multiple plugins with identical code bases were found, allowing malicious clones to steal user prompts intended for legitimate services.
Platform controls are insufficient: Existing reviews did not catch plugins that violate basic security principles, such as asking for cleartext passwords.

📚 Prerequisite Knowledge

Prerequisites

Understanding of LLM-based application architectures (e.g., ChatGPT Plugins)
Basic computer security concepts (Threat Modeling, Attack Trees)
Familiarity with prompt injection and indirect prompt injection

Key Terms

Plugin Manifest: A JSON file provided by a developer describing the plugin's metadata, authentication, and natural language description for the model

Threat Modeling: A systematic process to identify structural vulnerabilities and potential attacks in a system design

Prompt Injection: An attack where malicious instructions are inserted into the input stream to override the LLM's original instructions

Squatting: The act of mimicking a legitimate service or plugin name to trick users or the LLM into using a malicious version

Instruction Injection: A variant of prompt injection where the plugin's system description (hidden from the user) overrides the LLM's global behavior

History Sniffing: An attack where a plugin extracts the user's conversation history with the LLM, potentially violating privacy