Adapter: A small neural module added to a pre-trained model to learn new tasks/knowledge while keeping the main model frozen
Catastrophic forgetting: The tendency of neural networks to lose previously learned information upon learning new information
RoBERTa: Robustly optimized BERT approach; a transformer-based masked language model
T-REx: A large-scale alignment dataset between Wikipedia abstracts and Wikidata triples, used here for factual knowledge
Dependency parsing: Analyzing the grammatical structure of a sentence to establish relationships between 'head' words and words which modify those heads
LAMA: LAnguage Model Analysis—a probe to test factual knowledge in language models using cloze-style questions
Disentangled representation: Representations where different types of information (e.g., syntax vs. facts) are separated rather than mixed together
Skip-connection: A direct connection between non-adjacent layers in a neural network that allows information to bypass intermediate layers