← Back to Paper List

Boosting LLM via Learning from Data Iteratively and Selectively

Q Jia, S Ren, Z Qin, F Xue, J Ni, Y You
Shanghai Jiao Tong University, The Hong Kong Polytechnic University, National University of Singapore, University of Science and Technology of China, Harbin Institute of Technology (Shenzhen)
arXiv, 12/2024 (2024)
Reasoning Benchmark

📝 Paper Summary

Data Selection for Instruction Tuning Data Efficiency in LLM Training
IterIT improves instruction tuning by iteratively re-evaluating data complexity during training and greedily selecting diverse samples based on response content rather than just instructions.
Core Problem
Existing data selection methods compute static complexity scores before training, failing to adapt to the model's evolving capabilities, and often measure diversity based on instructions rather than informative responses.
Why it matters:
  • Models' difficulty perception changes during training; 55% of 'hard' samples after one epoch were not considered hard initially
  • Different instructions can yield similar, uninformative responses, reducing training efficiency
  • Selecting a small, high-quality subset can match or exceed full-dataset performance while significantly reducing training costs
Concrete Example: A model might find a physics problem difficult at epoch 0 but easy at epoch 1. Static selection methods would keep training on it, wasting compute, whereas IterIT detects the reduced difficulty and swaps it for a currently harder sample.
Key Novelty
Iterative Complexity-Diversity Selection (IterIT)
  • Re-calculates the Instruction-Following Difficulty (IFD) score for a candidate subset after every epoch to capture the model's dynamic learning progress
  • Measures diversity using TF-IDF on *responses* (not instructions) to ensure the selected subset covers diverse, informative answers
  • Uses a coarse-to-fine strategy: filters candidates globally first, then iteratively re-scores a smaller pool to keep computational cost affordable
Architecture
Architecture Figure Figure 1(c)
Conceptual flowchart of the IterIT process compared to static selection. It shows the loop of 'Model Training' -> 'Metrics Update' -> 'Data Selection' repeating for each epoch.
Evaluation Highlights
  • Outperforms training on the full Alpaca dataset (Vanilla) using only 5% of the data, achieving +1.25% on MixEval-Hard
  • Surpasses strong baselines like Deita and GraphFilter on average across 7 standard benchmarks (GSM8K, MMLU, etc.) when training LLaMA-3-8B
  • Demonstrates superior generalization on code generation, beating Vanilla on HumanEval and MBPP+ when training on CodeAlpaca
Breakthrough Assessment
7/10
Strong empirical results showing dynamic data selection beats static methods and even full-dataset training. The iterative re-scoring idea is intuitive and effective, though the computational overhead of re-inference is a trade-off.
×