RAG: Retrieval-Augmented Generation—AI systems that answer questions by first searching for relevant documents
ASR: Attack Success Rate—the percentage of times the victim model generates the attacker's desired incorrect answer
Recall@k: A metric measuring whether the malicious passage appears in the top-k retrieved documents
White-box attack: An attack where the adversary has full access to the model's parameters and gradients
Black-box attack: An attack where the adversary has no access to model parameters, only inputs and outputs (or a surrogate model)
Top-k sampling: A text generation method where the model selects the next token from the k most probable options
NER: Named Entity Recognition—identifying specific types of words like names, dates, or locations in text
Sentence-BERT: A modification of the BERT network that uses siamese networks to derive semantically meaningful sentence embeddings that can be compared using cosine-similarity
Gradient-based methods: Optimization techniques that use the derivative of a function (gradient) to find the best inputs; requires model access
data poisoning: Injecting malicious data into a training set or knowledge base to corrupt the model's behavior