_comment: REQUIRED: Define ALL technical terms, acronyms, and method names used ANYWHERE in the entire summary. After drafting the summary, perform a MANDATORY POST-DRAFT SCAN: check every section individually (Core.one_sentence_thesis, evaluation_highlights, core_problem, Technical_details, Experiments.key_results notes, Figures descriptions and key_insights). HIGH-VISIBILITY RULE: Terms appearing in one_sentence_thesis, evaluation_highlights, or figure key_insights MUST be defined—these are the first things readers see. COMMONLY MISSED: PPO, DPO, MARL, dense retrieval, silver labels, cosine schedule, clipped surrogate objective, Top-k, greedy decoding, beam search, logit, ViT, CLIP, Pareto improvement, BLEU, ROUGE, perplexity, attention heads, parameter sharing, warm start, convex combination, sawtooth profile, length-normalized attention ratio, NTP. If in doubt, define it.
Categorical Consistency: A metric measuring the proportion of properties that correctly change to match a subject's newly assigned category after an edit
Invariance: A metric measuring the proportion of properties that correctly remain unchanged after an edit because they are shared by both the old and new categories
ROME: Rank-One Model Editing—a method that locates and alters specific factual associations in a model's MLP weights by treating them as key-value pairs
ICE: In-Context Editing—a method that prepends the desired edit (e.g., 'Imagine that a cobra is a dog') to the prompt context rather than changing model weights
EasyEdit: An open-source software framework used to implement and evaluate various knowledge editing methods
FT: Finetuning—updating model weights via gradient descent on the specific edit example
Taxonomy: A hierarchical classification system (e.g., Animal -> Dog -> Labrador) where lower levels inherit properties from higher levels
Superordinate category: A high-level category grouping (e.g., 'Animals', 'Vehicles') that contains the specific categories used in the edits
CounterFact: A prior dataset for knowledge editing that focuses on counterfactual updates but lacks the specific property inheritance structure of TAXI
MQuAKE: Multi-hop Question Answering for Knowledge Editing—a benchmark evaluating if edits propagate through multi-hop reasoning chains
RippleEdits: A benchmark measuring the 'ripple effects' of edits, checking if related facts update consistent with the primary edit