Knowledge Overshadowing: A phenomenon where more prevalent (popular) knowledge representation competes against and suppresses less frequent knowledge, causing the model to ignore specific constraints
Log-Linear Law: A mathematical relationship found in this paper: Hallucination Rate ~ log(Popularity) + log(Length) + log(Model Size)
CoDa: Contrastive Decoding to Amplify Overshadowed Knowledge—the proposed decoding strategy that boosts the probability of tokens that are valid but suppressed by dominant associations
Contrastive Decoding: A generation method that selects tokens maximizing the difference between a strong expert model and a weaker amateur model (or in this case, modified prompts)
Mutual Information: A measure used here to quantify the dependence between the prompt and the next token, helping to identify which parts of the prompt are being ignored (overshadowed)
Knowledge Popularity: The relative frequency of a specific fact or entity within the training corpus
Knowledge Length: The proportional length (in tokens) of the specific distinguishing knowledge relative to the shared context