Evaluation Setup
Merging a malicious task-specific model with benign models (e.g., CIFAR-100, GTSRB, SVHN) using CLIP-like backbones
Benchmarks:
- CIFAR-100 (Image Classification)
- GTSRB (Traffic Sign Recognition)
- SVHN (Digit Classification)
Metrics:
- Attack Success Rate (ASR)
- Benign Accuracy (BA)
- Statistical methodology: Not explicitly reported in the paper
Key Results
| Benchmark |
Metric |
Baseline |
This Paper |
Δ |
| Aggregate results reported in the introduction comparing BadMerging to existing backdoor techniques in the context of Model Merging. |
| Merged Tasks (Aggregate) |
Attack Success Rate (ASR) |
20.00 |
90.00 |
+70.00
|
Main Takeaways
- Standard backdoor attacks fail (<20% ASR) because merging coefficients (often small, e.g., 0.3) scale down the trigger's effect
- BadMerging successfully compromises merged models (>90% ASR) by making the backdoor robust to coefficient scaling
- The attack is effective even when the adversary does not know the other tasks being merged (off-task attack) via the use of shadow classes