CIT: Corpus-Invariant Tuning—a training strategy that adds a loss term to prevent the reader model from becoming better at generating the retrieved documents themselves
OpenQA: Open-domain Question Answering—answering fact-based questions using a large, unstructured text corpus rather than a specific context provided with the question
Atlas: A state-of-the-art retrieval-augmented language model architecture that uses a Contriever for retrieval and Fusion-in-Decoder (FiD) for reading
EM: Exact Match—a metric measuring the percentage of predictions that match the ground truth answer exactly
FiD: Fusion-in-Decoder—a reader architecture that processes retrieved documents independently in the encoder and fuses representations in the decoder
Contriever: A dense information retrieval model trained using contrastive learning
Masked Span Prediction: A pre-training objective where the model predicts masked-out sequences of text, used here as a proxy for the likelihood of the document
CRP: Cross-domain Relative Performance—a metric defined in this paper as the ratio of cross-domain performance to intra-domain performance