| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Experiments using Qwen3-4B (non-thinking mode) to recover latent 'thinking' capabilities demonstrate NSR's ability to match strong baselines. | ||||
| MATH | Pass@1 | 93.9 | 94.0 | +0.1 |
| MATH | Pass@64 | 98.2 | 98.0 | -0.2 |
| MATH | Pass@k (scaling trend) | Low | High | Positive |