RAG-DI: RAG Dataset Inference—the problem of detecting whether a specific dataset is included in a RAG system's knowledge base
LLM Watermarking: Embedding a statistical signal into text generated by an LLM (typically by biasing vocabulary choice) to allow later detection
Red-Green Watermark: A specific watermarking scheme where the vocabulary is split into 'green' (promoted) and 'red' (demoted) tokens based on the preceding context
MIA: Membership Inference Attack—determining if a specific data point was used to train a model or is present in a database
Fact Redundancy: A realistic scenario where the same factual information appears in multiple documents in a corpus, complicating attribution
FARAD: Fact-Redundant Article Dataset—a new benchmark introduced in this paper designed to evaluate RAG-DI under realistic conditions of information overlap
z-score: A statistical measurement describing a value's relationship to the mean of a group of values, used here to measure watermark strength
p-value: The probability of observing results at least as extreme as the observed results assuming the null hypothesis (no watermark) is true
system prompt defense: Instructions given to an LLM (e.g., 'do not reveal sources') to prevent it from leaking information about its context