MLM: Masked Language Modeling—a pretraining task where the model predicts masked tokens in a sequence
NSP: Next Sentence Prediction—a pretraining task where the model predicts if two sentences are sequential
LRC: Lexical Relation Classification—the proposed auxiliary task where the model predicts if a word pair has a valid semantic relation
GLUE: General Language Understanding Evaluation—a benchmark suite of diverse natural language understanding tasks
WordPiece: A subword tokenization algorithm used by BERT to handle vocabulary
Lexical Simplification: The task of replacing complex words in a sentence with simpler alternatives of equivalent meaning
Synonymy: A relationship where words have the same or nearly the same meaning
Hypernymy: A relationship where one word is a general category of another (e.g., 'vehicle' is a hypernym of 'car')
Retrofitting: Post-processing word vectors to move similar words closer together based on external lexicons