| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Advantage Analysis: These results quantify how NGRPO alters the learning signal compared to GRPO in specific group scenarios, demonstrating the mechanism's adaptive behavior. | ||||
| N/A (Analytical Case Study) | Advantage (Correct Sample) | 2.47 | 1.76 | -0.71 |
| N/A (Analytical Case Study) | Advantage (Incorrect Sample) | -0.35 | -0.50 | -0.15 |