Evaluation Setup
Long-context understanding and reasoning tasks.
Benchmarks:
- StrucText-Eval (Structured data extraction (JSON, Code))
- RULER (Long-context benchmark)
- LongBench V2 (Long-context benchmark)
- MATH500 (Complex reasoning)
Metrics:
- Accuracy
- Inference Speedup (Latency reduction)
- Statistical methodology: Not explicitly reported in the paper
Key Results
| Benchmark |
Metric |
Baseline |
This Paper |
Δ |
| StrucText-Eval |
Average Accuracy |
Not reported in the paper |
Not reported in the paper |
Not reported in the paper
|
| End-to-End Inference |
Speedup vs Full Attention |
1.0 |
3.6 |
+2.6
|
Main Takeaways
- Structure-aware chunking significantly improves retrieval accuracy compared to fixed-size paging, especially on structured data like JSON.
- Hierarchical indexing allows for sub-linear retrieval complexity, breaking the latency bottleneck of linear scanning.
- LycheeCluster maintains robustness on reasoning-intensive tasks (MATH500) where heuristic baselines often fail due to loss of logical coherence.