CFA: Combinatorial Fusion Analysis—a framework for combining multiple scoring systems (agents) using rank-score functions and diversity measurements
DPO: Direct Preference Optimization—an alignment method that optimizes a policy directly on preference data without training a separate reward model
RLHF: Reinforcement Learning from Human Feedback—the standard method for aligning LLMs using a reward model trained on human preferences
Moral Integrity Corpus (MIC): A dataset of prompt-response pairs with human-revised answers and ethical annotations, used here for fine-tuning
QLoRA: Quantized Low-Rank Adaptation—a parameter-efficient fine-tuning method that reduces memory usage by quantizing the base model
Cognitive Diversity: A measure of the difference between the rank-score functions of two different agents/systems
Kemeny Rank Space: A mathematical space representing all possible rankings, including those with ties, used to model the aggregation of agent preferences
ROUGE-L: A metric measuring the longest common subsequence between generated text and a reference, used to evaluate content overlap
BERTScore: A metric that computes similarity between generated text and references using contextual embeddings