| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| LongBench results demonstrate NAMM's ability to improve over the full-context baseline while compressing memory, unlike heuristics which degrade performance. | ||||
| LongBench (All Tasks) | Normalized Performance | 1.00 | 1.11 | +0.11 |
| LongBench (Test Tasks - Held Out) | Normalized Performance | 1.00 | 1.07 | +0.07 |
| LongBench | Average Cache Size | 1024 | 733 | -291 |
| Decision Transformer (Atari Breakout) | Normalized Score | 1.0 | 1.9 | +0.9 |
| Stable Diffusion (MS-COCO) | FID (Frechet Inception Distance) | 20.5 | 20.3 | -0.2 |