RAG: Retrieval-Augmented Generation—AI systems that answer questions by first searching for relevant documents, then generating responses based on what they find
Spiral of Silence: A communication theory where minority views are suppressed; here adapted to mean human content is marginalized by retrieval algorithms favoring LLM text
ODQA: Open Domain Question Answering—answering questions using a large collection of documents without a pre-defined domain
Self-BLEU: A metric measuring the diversity of generated text by comparing a sentence against others in the same set; higher scores indicate lower diversity (more repetition)
BM25: A probabilistic retrieval function based on exact keyword matching and term frequency
Contriever: A dense retrieval model trained using contrastive learning to match queries and documents in a semantic vector space
Acc@5: Accuracy at 5—the percentage of queries where the correct answer appears in the top 5 retrieved documents
Exact Match (EM): A metric checking if the generated answer string exactly contains the ground truth answer
Zero-shot: Using a model to perform a task without providing any specific training examples in the prompt