| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Language modeling results on WikiText-103 showing Memory Decoder effectiveness across GPT-2 scales. | ||||
| WikiText-103 | Perplexity | 31.09 | 18.36 | -12.73 |
| WikiText-103 | Perplexity | 19.78 | 18.36 | -1.42 |
| Cross-model adaptation results demonstrating a single Memory Decoder (0.5B) improving the entire Qwen2.5 family on Financial domain. | ||||
| Finance Domain | Perplexity | 11.75 | 6.87 | -4.88 |
| Finance Domain | Perplexity | 5.62 | 5.35 | -0.27 |
| Downstream task performance (Zero-shot) comparing preservation of general capabilities. | ||||
| Average (9 tasks) | Score | 50.1 | 61.3 | +11.2 |
| Inference latency comparison. | ||||
| Inference Speed | Latency Overhead (relative to base) | 2.17 | 1.28 | -0.89 |