← Back to Paper List

EmoRAG: Evaluating RAG Robustness to Symbolic Perturbations

X Zhou, X Li, Y Peng, M Xu, X Zhang, M Yu, Y Wang…
ZJU, NTU, Hengxin Tech., NUS, NJU, PKU, Squirrel Ai Learning
arXiv, 12/2025 (2025)
RAG QA

📝 Paper Summary

RAG security Adversarial attacks on RAG
Retrieval-Augmented Generation systems are highly vulnerable to symbolic perturbations where a single emoticon in a query can force the retrieval of irrelevant content containing that same emoticon.
Core Problem
RAG systems assume retrieval is driven by semantic relevance, but they are actually highly sensitive to rare symbolic tokens (like emoticons) that hijack the embedding space regardless of semantic meaning.
Why it matters:
  • Adversaries can inject seemingly harmless emoticons into documents to force their retrieval over legitimate content
  • This vulnerability breaks the fundamental assumption of RAG that retrieved content is semantically relevant to the user's query
  • Current defenses like perplexity-based detection fail because emoticons are natural in online communication and do not trigger 'garbled text' alerts
Concrete Example: If a user queries 'How to implement quicksort? (@_@)', the system might ignore coding tutorials and instead retrieve a completely unrelated document about cooking that happens to contain the same '(@_@)' emoticon.
Key Novelty
EmoRAG (Emoticon-based Retrieval Augmented Generation Attack)
  • Demonstrates a 'decoupling' of semantic relevance and retrieval outcome: symbolic matches (emoticons) dominate semantic matches in vector space
  • Identifies that emoticons at the start of a query shift positional embeddings significantly, altering the entire query representation
  • Shows that larger models are counter-intuitively *more* vulnerable to this sparse token interference than smaller models
Architecture
Architecture Figure Figure 1
Conceptual illustration of the EmoRAG attack. A user query 'How to implement quicksort? (@_@)' is hijacked to retrieve an unrelated document 'Delicious apple pie recipe (@_@)' instead of the relevant 'Quicksort Algorithm' document.
Evaluation Highlights
  • Injecting a single emoticon at the beginning of a query causes F1-Scores for retrieving irrelevant target content to exceed 0.92 across all datasets
  • Large models (>7B parameters) are extremely vulnerable, achieving Attack Success Rates (ASR) of nearly 100% under perturbation
  • BERT-based defense model trained on perturbed text achieves 99% accuracy in detecting emoticon attacks
Breakthrough Assessment
8/10
Reveals a critical, previously overlooked vulnerability in RAG systems (symbolic hijacking) that affects almost all state-of-the-art retrievers and generators, with a very simple attack vector.
×