IPI: Indirect Prompt Injection—attacks where malicious instructions are embedded in external data (e.g., web pages) retrieved by the agent, rather than in the user's direct prompt
State Collapse: The phenomenon where an agent's next action becomes statistically independent of the user's input and dependent primarily on malicious retrieved context
Masking Function: A transformation that replaces the specific user task in the agent's context with a neutral, generic prompt (e.g., 'Summarize the provided content') to test dependency
Tool Call Cache: A memory mechanism in MELON that stores tool calls generated during masked runs to identify if an attack task is executed lazily in the original run
AgentDojo: A benchmark dataset designed to evaluate the security and robustness of LLM agents against indirect prompt injection attacks
ASR: Attack Success Rate—the percentage of attack attempts that successfully trick the agent into performing the unauthorized action
Cosine Similarity: A metric used to measure how similar two vector embeddings are, used here to compare the semantic meaning of tool calls