Known (category): Questions where the pre-trained model consistently predicts the correct answer with greedy decoding (Temperature=0).
MaybeKnown (category): Questions where greedy decoding gives a probability between 0 and 1 for the correct answer (inconsistent).
WeaklyKnown (category): Questions where greedy decoding fails (score=0), but random sampling (Temperature > 0) sometimes finds the correct answer.
Unknown (category): Questions where the model never predicts the correct answer, even with sampling; suggests total lack of knowledge.
Greedy decoding: A generation strategy where the model always picks the single most likely next token.
Temperature sampling: A generation strategy where the model picks the next token randomly based on probabilities, allowing for more diverse (and potentially correct) outputs if the 'correct' token wasn't the absolute top choice.
Exact Match (EM): Evaluation metric checking if the generated text is identical to the ground truth.
Entity_Questions: The specific dataset used, derived from Wikidata triples transformed into QA pairs.
PaLM 2-S: The specific size of the PaLM 2 model used as the base model in experiments.