cloze sentence: A sentence with a blank space that the model is asked to fill (e.g., 'The capital of France is ____')
distractor: An incorrect but plausible alternative answer used to test if the model can distinguish truth from likely falsehoods
OOS: Out-Of-Subject continuation—text generated by an LM that is grammatically correct but irrelevant to the factual query (e.g., 'Paris is... a nice city')
Kendall's τ: A correlation coefficient used to measure the ordinal association between two measured quantities (here, metric scores vs. human ratings)
ApprOpt: Approximation of Optimal Distractors—a strategy to find distractors by beam-searching the LM's own high-probability generations constrained to incorrect entities
Plausibility: The sum of probabilities of all valid labels (aliases) for a specific entity given a context
KaRR: Knowledge-as-Reranking—a statistical metric estimating the ratio between the probability of generating the correct answer given the LM's distribution versus by pure chance
TF-IDF: Term Frequency-Inverse Document Frequency—a statistical measure used to evaluate how relevant a word is to a document in a collection
Precision@n: A metric checking if the correct answer appears in the top-n most probable generations
Wikidata: A free, collaborative, multilingual, secondary database, collecting structured data to provide support for Wikipedia