Evaluation Setup
Layer-wise gradient analysis of diverse LLMs on high vs. low quality data subsets
Benchmarks:
- WizardLM / Magpie / OpenHermes 2.5 (Instruction Following)
- s1.1 / GSM8K (Reasoning (Math))
Metrics:
- Nuclear Norm (Gradient Magnitude)
- Effective Rank (Gradient Diversity)
- Same-layer / Adjacent-layer Cosine Similarity
- Statistical methodology: Not explicitly reported in the paper
Main Takeaways
- Unified Spectral Signature: Across all studied metrics (IFD, Reward, etc.), high-quality data consistently results in gradients with lower nuclear norms (less 'effort') and higher effective ranks (more 'diversity').
- Reasoning Complexity: High-quality reasoning data (s1.1) induces the highest effective ranks among all datasets, suggesting that complex reasoning tasks require updating model parameters in a structurally richer way than simple instructions.
- Metric Superiority: Effective Rank is more sensitive and robust than Nuclear Norm in distinguishing subtle quality differences, particularly for reasoning tasks.
- Model Families: Gradient patterns (spectral properties) are consistent across different sizes within the same model family but diverge significantly between families (e.g., Qwen vs. Llama), indicating family-specific learning dynamics.