← Back to Paper List

Enhancing Value Alignment of LLMs with Multi-agent system and Combinatorial Fusion

Yuanhong Wu, Djallel Bouneffouf, D. Frank Hsu
arXiv (2026)
Agent RL

📝 Paper Summary

Multi-agent Value Alignment
VAS-CFA aligns LLMs by instantiating five distinct moral agents and fusing their outputs using Combinatorial Fusion Analysis to capture ethical pluralism better than single-agent methods.
Core Problem
Existing alignment methods like RLHF rely on single evaluators or narrow reward signals, failing to capture ethical pluralism and often producing evasive or generic responses.
Why it matters:
  • Models pretrained on web corpora can produce unsafe or untruthful outputs if not aligned with diverse human values
  • Single-agent RLHF approaches risk overfitting to narrow objectives, missing crucial ethical complexity and cognitive diversity
  • Direct aggregation of multi-agent outputs often leads to semantic conflicts and diluted answers
Concrete Example: When asked a complex moral question, a standard model might give a generic safe answer. In VAS-CFA, an agent focused on 'Care' might prioritize health ('Ensure your child grows up healthy'), while an agent focused on 'Authority' might prioritize rules. Simply averaging these text outputs creates gibberish; VAS-CFA extracts distinct moral units and ranks them to find the best consensus.
Key Novelty
Value Alignment System using Combinatorial Fusion Analysis (VAS-CFA)
  • Instantiates five separate 'moral agents' (Authority, Care, Fairness, Loyalty, Sanctity) fine-tuned via DPO to represent distinct normative perspectives
  • Decomposes agent responses into atomic 'moral units' rather than aggregating full text, preventing semantic incoherence
  • Applies Combinatorial Fusion Analysis (CFA) to score and rank these units, leveraging diversity strength to weigh the consensus between agents non-linearly
Architecture
Architecture Figure Figure 1
The complete VAS-CFA workflow from multi-agent generation to final paraphrased output.
Evaluation Highlights
  • Rank-based combinations (ARC/WRCDS) consistently outperform score-based combinations (ASC/WSCDS) due to cognitive diversity
  • VAS-CFA outperforms single moral agents and previous multi-agent baselines (CVA-GS) on F1 ROUGE-L and F1 BERTScore metrics
  • Five distinct agents exhibit measurable cognitive diversity across the test set, validating the multi-perspective approach
Breakthrough Assessment
7/10
Novel integration of Combinatorial Fusion Analysis with multi-agent LLM alignment. While it demonstrates improvements, it relies on a specific set of 5 moral foundations and standard metrics (ROUGE/BERTScore) rather than human evaluation of the final fusion.
×