| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| JPS consistently outperforms baselines on HarmBench, particularly in the MIFR metric, indicating higher utility responses. | ||||
| HarmBench | MIFR | 52.00 | 86.50 | +34.50 |
| HarmBench | ASR | 60.50 | 93.50 | +33.00 |
| HarmBench | MIFR | 74.00 | 83.00 | +9.00 |
| Ablation studies confirm the necessity of both visual and textual components. | ||||
| HarmBench | ASR | 93.50 | 18.50 | -75.00 |
| HarmBench | MIFR | 86.50 | 74.00 | -12.50 |