ICM: Internal Coherence Maximization—the proposed unsupervised algorithm that labels data by maximizing mutual predictability and logical consistency
Mutual Predictability: A measure of how confidently a model can predict the label of one example when conditioned on the labels of other examples in the dataset
Logical Consistency: Constraints applied to labels to prevent contradictions (e.g., if answer A is correct, answer B cannot also be correct for the same question)
Simulated Annealing: A probabilistic optimization algorithm used here to search for the best set of labels by iteratively accepting or rejecting changes based on a scoring function
In-context learning: The ability of a model to perform a task by conditioning on examples provided within the prompt, used here to estimate mutual predictability
Reward Model (RM): A model trained to predict a scalar score indicating the quality or preference of a response, usually used to guide reinforcement learning
RLHF: Reinforcement Learning from Human Feedback—a standard method for aligning language models using human preference labels
Golden Labels: Ground truth labels provided by the dataset creators or experts, used as a ceiling for performance comparisons
Constitutional AI: A method for aligning AI systems using a set of principles (a constitution) and AI feedback rather than direct human labels