WKLM: Weakly Supervised Knowledge-Pretrained Language Model—the proposed model that learns to distinguish true entities from same-type replacements.
MLM: Masked Language Model—a pretraining objective where random tokens are hidden and the model must predict them.
Hits@10: A metric measuring the percentage of times the correct answer appears in the top 10 predictions.
Entity Replacement: The core pretraining strategy where an entity mention is swapped with a random entity of the same type to create a negative training example.
Zero-shot fact completion: A task where the model must predict missing entities in factual statements (converted from knowledge base triples) without specific training on those facts.
FIGER: A dataset for fine-grained entity typing, requiring models to assign specific types to entity mentions.
SQuAD: Stanford Question Answering Dataset—a reading comprehension benchmark.
Wikidata: A structured knowledge base used here to determine entity types and validate relations.
Entity Linking: The process of identifying entity mentions in text and mapping them to unique identifiers in a knowledge base.