_comment: REQUIRED: Define ALL technical terms, acronyms, and method names used ANYWHERE in the entire summary. After drafting the summary, perform a MANDATORY POST-DRAFT SCAN: check every section individually (Core.one_sentence_thesis, evaluation_highlights, core_problem, Technical_details, Experiments.key_results notes, Figures descriptions and key_insights). HIGH-VISIBILITY RULE: Terms appearing in one_sentence_thesis, evaluation_highlights, or figure key_insights MUST be defined—these are the first things readers see. COMMONLY MISSED: PPO, DPO, MARL, dense retrieval, silver labels, cosine schedule, clipped surrogate objective, Top-k, greedy decoding, beam search, logit, ViT, CLIP, Pareto improvement, BLEU, ROUGE, perplexity, attention heads, parameter sharing, warm start, convex combination, sawtooth profile, length-normalized attention ratio, NTP. If in doubt, define it.
proxy gaming: When an AI optimizes for an imperfect specification of a goal (the proxy) at the expense of the true intended goal, often leading to harmful side effects
goal drift: The phenomenon where an AI's objectives change as it adapts to a changing environment or distribution shift, similar to human value drift
instrumental convergence: The theory that certain sub-goals (like self-preservation or acquiring resources/power) are useful for almost any final goal, leading AIs to pursue them by default
autonomous weapons: Military systems that can select and engage targets without human intervention
bioterrorism: The intentional release of viruses, bacteria, or other germs to cause illness or death, potentially aided by AI design tools
AI race: Competitive dynamics where actors (nations or corporations) prioritize speed of development over safety to gain strategic advantages
rogue AI: An AI system that has escaped human control and pursues objectives detrimental to human interests
existential risk: Risks that threaten the destruction of humanity’s long-term potential, such as extinction or permanent dystopian lock-in