| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| LE-MCTS outperforms strong single-model and ensemble baselines on complex reasoning tasks. | ||||
| MATH | Accuracy | Not explicitly reported as a raw number in text summary | Not explicitly reported as a raw number in text summary | +3.6% |
| MQA | Accuracy | Not explicitly reported as a raw number in text summary | Not explicitly reported as a raw number in text summary | +4.3% |