← Back to Paper List

RAG-targeted Adversarial Attack on LLM-based Threat Detection and Mitigation Framework

S Ikbarieh, K Aryal, M Gupta
Not explicitly reported in the paper
arXiv, 11/2025 (2025)
RAG Benchmark

📝 Paper Summary

Modularized RAG pipeline IoT Security Adversarial Attacks
This paper demonstrates that targeted, word-level data poisoning of a RAG knowledge base significantly degrades the performance of LLM-based IoT attack analysis and mitigation frameworks.
Core Problem
Integrating LLMs into Network Intrusion Detection Systems (NIDS) expands the attack surface, specifically introducing vulnerability to RAG data poisoning where malicious context corrupts analysis.
Why it matters:
  • IoT devices are rapidly expanding (18.8 billion by 2024) but are resource-constrained and highly vulnerable to cyberattacks
  • LLM-based defense frameworks are rarely tested against adversarial attacks, leaving a critical research gap regarding their reliability under retrieval corruption
  • Resource-constrained IoT environments require precise, device-specific mitigations, which poisoned models fail to provide
Concrete Example: An RF classifier correctly detects a 'Port Scanning' attack. However, because the RAG knowledge base was poisoned with a perturbed description, the system retrieves a description for 'Vulnerability Scanning' instead. Consequently, ChatGPT-5 Thinking provides mitigation advice for vulnerability scanning rather than the actual port scanning attack.
Key Novelty
Transfer-learning based RAG Data Poisoning for IoT NIDS
  • Constructs a dataset of 18 IoT attack descriptions and generates paraphrased variants to fine-tune a surrogate BERT model
  • Uses the surrogate model to craft word-level, meaning-preserving perturbations (via TextFooler) that target specific decision boundaries
  • Injects these adversarial descriptions into the RAG knowledge base to disrupt retrieval and degrade the downstream reasoning of a black-box LLM (ChatGPT-5 Thinking)
Architecture
Architecture Figure Figure 1
The complete framework pipeline including Attack Detection, RAG, Prompt Engineering, LLM Analysis, and the Adversarial Attack injection point.
Evaluation Highlights
  • Demonstrates successful degradation of ChatGPT-5 Thinking's performance in attack analysis and mitigation suggestion through RAG poisoning
  • Proposes a new IoT attack description dataset covering 18 attack types derived from Edge-IIoTset and CICIoT2023
  • Establishes a quantitative scoring rubric for evaluating LLM-based NIDS responses using both human experts and judge LLMs
Breakthrough Assessment
7/10
Solid application of adversarial NLP techniques to the specific domain of IoT NIDS. While the attack method (TextFooler on BERT) is established, applying it to poison RAG in a critical infrastructure context is a valuable contribution.
×