implicit reasoning: The ability of a language model to draw new conclusions (deduce missing edges) from existing knowledge seen during pretraining without explicit chain-of-thought training
graph search entropy: A metric quantifying the complexity of a knowledge graph, calculated as the entropy rate of a maximal entropy random walk over the graph
deducible triples: Triples in a knowledge graph that can be inferred from other triples using a set of logical rules (e.g., transitivity)
atomic triples: Triples in a knowledge graph that cannot be inferred from other triples and must be memorized
optimal model size: The specific model parameter count that achieves the minimum testing loss for a given dataset, derived from the bottom of the U-shaped loss curve
preferential attachment: A graph generation process where new nodes prefer to attach to existing nodes with high degrees, creating scale-free networks
FB15K-237: A standard benchmark dataset for knowledge graph completion, derived from Freebase
broken neural scaling law: A deviation from power-law scaling where performance non-monotonically changes with scale (often double-descent or U-shaped)
inverse scaling: A phenomenon where larger models perform worse than smaller models on specific tasks
transitive rule: A logic rule where A->B and B->C implies A->C (e.g., ancestor relationships)