← Back to Paper List

HelpSteer2: Open-source dataset for training top-performing reward models

Zhilin Wang, Yi Dong, Olivier Delalleau, Jiaqi Zeng, Gerald Shen, Daniel Egert, Jimmy Zhang, Makesh Narsimhan Sreedhar, Oleksii Kuchaiev
NVIDIA
Neural Information Processing Systems (2024)
RL Benchmark Factuality

📝 Paper Summary

Reward Modeling Preference Datasets RLHF
HelpSteer2 is a permissively licensed, multi-attribute preference dataset that enables training state-of-the-art reward models using only 10,000 high-quality, human-annotated prompt-response pairs.
Core Problem
Existing permissively licensed preference datasets (e.g., HH-RLHF) are outdated, while high-quality synthetic datasets often carry restrictive commercial licenses preventing their use in proprietary model development.
Why it matters:
  • Training aligned LLMs requires high-quality preference data, but proprietary models (GPT-4) restrict the commercial use of their outputs.
  • Current open datasets like Open Assistant lack the quality and detailed attribute labeling needed for modern SOTA reward modeling.
  • Lack of transparency in training data for models like Llama 3 hinders community reproduction of alignment techniques.
Concrete Example: While Llama 2 utilized over 1 million binary comparisons, HelpSteer2 achieves high performance with just 10k samples by using detailed 5-attribute scoring (e.g., separating 'Verbosity' from 'Helpfulness') rather than simple binary choices.
Key Novelty
Dense Multi-Attribute Human Annotation with Strict Agreement Filtering
  • Replaces simple binary 'better/worse' labels with 5 specific attributes (Helpfulness, Correctness, Coherence, Complexity, Verbosity) on a 5-point Likert scale.
  • Enforces high data quality by requiring 3+ annotators per sample and discarding any data where annotator disagreement exceeds 2 points.
  • Collects responses from a diverse mix of models (Nemotron-2/3/4, Mixtral) to ensure broad coverage of response styles.
Evaluation Highlights
  • Achieves SOTA score (92.0%) on Reward-Bench's primary dataset, outperforming listed open and proprietary models as of June 2024.
  • Dataset efficiency: Uses only ~10,000 response pairs, an order of magnitude fewer than HH-RLHF (~160k), to achieve top performance.
  • Inter-annotator agreement for 'Helpfulness' improved to 0.706 (Cohen's Kappa) through strict quality control, compared to 0.465 in initial collection.
Breakthrough Assessment
8/10
Highly significant contribution due to the release of a commercially viable (CC-BY-4.0), SOTA-enabling dataset. It demonstrates that data quality and density of signal (attributes) outweigh massive scale.
×