proposition: An atomic, self-contained expression of a distinct factoid within text, generated to stand alone without surrounding context
FactoidWiki: A processed version of English Wikipedia where each page is segmented into passages, sentences, and propositions for retrieval experiments
dense retrieval: Retrieval method using vector embeddings to match queries and documents, as opposed to keyword matching
Recall@k: The percentage of questions where the correct answer appears in the top-k retrieved documents
EM: Exact Match—a metric measuring if the predicted answer string exactly matches the ground truth
SimCSE: A contrastive learning framework for training sentence embeddings
Contriever: An unsupervised dense retriever trained via contrastive learning
DPR: Dense Passage Retriever—a supervised dual-encoder model trained on QA pairs
GTR: Generalizable T5-based dense retriever
FiD: Fusion-in-Decoder—a method where a model encodes retrieved passages independently and fuses them in the decoder to generate an answer
BM25: A probabilistic retrieval function based on term frequency and inverse document frequency (keyword matching)