| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Memory efficiency analysis shows vLLM drastically reduces wasted space compared to baselines. | ||||
| Profiling on A100 | Internal Fragmentation (%) | 79.6 | 3.8 | -75.8 |
| Batch size capabilities demonstrate how memory efficiency translates to higher concurrency. | ||||
| ShareGPT (OPT-13B) | Average Batched Requests | 13.62 | 30.42 | +16.80 |
| Alpaca (OPT-13B) | Average Batched Requests | 72.75 | 132.44 | +59.69 |
| Throughput experiments showing performance gains under latency constraints. | ||||
| Alpaca (OPT-175B, 8 GPUs) | Request Rate (req/s) before latency spike | 10 | 18 | +8 |