Prompt Injection: An attack where malicious instructions are disguised as data (e.g., in a webpage or email) to trick an LLM into overriding its original instructions
Jailbreak: A specific type of prompt engineering designed to bypass an LLM's safety training (e.g., 'ignore all safety rules')
Chain-of-Thought (CoT): A prompting technique where the model generates intermediate reasoning steps before producing a final answer
Static Analysis: Analyzing code for bugs or vulnerabilities without executing it, often using pattern matching or syntax trees
Semgrep: A static analysis tool that finds bugs using code patterns that look like source code
ASR: Attack Success Rate—the percentage of adversarial attacks that successfully cause the model to misbehave
Indirect Injection: A prompt injection attack delivered via a third-party source (like a webpage the agent reads) rather than directly by the user
BERT: Bidirectional Encoder Representations from Transformers—a language model architecture optimized for understanding context, often used for classification
CoT Auditor: A secondary model that inspects the primary agent's Chain-of-Thought to verify it hasn't been hijacked