| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Smart-LLaMA consistently outperforms state-of-the-art baselines in F1 score across all four tested vulnerability types. | ||||
| Custom Dataset | F1 improvement (Reentrancy) | Not explicitly reported in the paper | Not explicitly reported in the paper | +7.35 |
| Custom Dataset | F1 improvement (Delegatecall) | Not explicitly reported in the paper | Not explicitly reported in the paper | +9.55 |
| Custom Dataset | F1 improvement (Integer Overflow) | Not explicitly reported in the paper | Not explicitly reported in the paper | +7.82 |
| Custom Dataset | Accuracy improvement (Reentrancy) | Not explicitly reported in the paper | Not explicitly reported in the paper | +4.14 |