| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Performance on ∞-Bench (100K+ tokens) shows InfLLM matching fine-tuned models. | ||||
| ∞-Bench | Average Score | 22.86 | 22.82 | -0.04 |
| ∞-Bench | Average Score | 11.11 | 22.82 | +11.71 |
| Performance on LongBench (mixed tasks) demonstrates superiority over sliding-window approaches. | ||||
| LongBench | Average Score | 16.59 | 44.18 | +27.59 |
| LongBench | Average Score | 45.03 | 44.18 | -0.85 |