Evaluation Setup
Document transcription and image localization
Benchmarks:
- OlmOCR-Bench (Document OCR / Text Extraction)
- LightOnOCR-bbox-bench (Image Bounding Box Localization) [New]
Metrics:
- F1 score (IoU threshold 0.5)
- Mean IoU
- Count Accuracy (exact match on number of boxes)
- OlmOCR-Bench scores (implied, specific metric not detailed in text snippet)
- Statistical methodology: Not explicitly reported in the paper
Key Results
| Benchmark |
Metric |
Baseline |
This Paper |
Δ |
| LightOnOCR-2-1B achieves state-of-the-art results on OCR benchmarks while being significantly smaller than competitors. |
| OlmOCR-Bench |
Overall Performance |
Not reported in the paper |
Not reported in the paper |
Not reported in the paper
|
Main Takeaways
- LightOnOCR-2-1B (1B parameters) outperforms 9B-scale models on OlmOCR-Bench, validating the efficiency of high-quality data curation and end-to-end training.
- Increasing training resolution to 1540px (from 1024px) and scaling data mixture 2.5x significantly improves handling of dense scientific text.
- RLVR effectively mitigates specific VLM failure modes like repetition loops and math formatting errors without requiring massive supervised re-annotation.
- Checkpoint averaging (souping) and task-arithmetic merging allow controlling the trade-off between pure OCR quality and bounding box localization accuracy.