Evaluation Setup
Empirical load testing and security scenario testing on a Virtual Machine (VM) running a simple weather MCP server protected by MCP Guardian
Benchmarks:
- Custom Security Scenarios (Threat Mitigation) [New]
- Local Load Test (Latency and Overhead Measurement) [New]
Metrics:
- Median Latency (ms)
- 95th Percentile Latency (ms)
- Statistical methodology: Not explicitly reported in the paper
Key Results
| Benchmark |
Metric |
Baseline |
This Paper |
Δ |
| Latency evaluation demonstrates the minimal computational overhead introduced by the middleware layer during a local load test. |
| Local Load Test |
Median Latency (ms) |
25.1 |
28.9 |
3.8
|
| Local Load Test |
95th Percentile Latency (ms) |
32.4 |
36.7 |
4.3
|
Main Takeaways
- Effectively blocked unauthorized access attempts when invalid or missing tokens were provided to the system
- Successfully intercepted malicious inputs (e.g., 'rm -rf /') via Web Application Firewall (WAF) regex scanning before they reached the vulnerable tool server
- Rate limiting correctly restricted high-frequency abuse during a 100-request stress test, returning a '429 Too Many Requests' error after 5 calls to prevent resource exhaustion
- Performance overhead is minimal (10-15% increase in latency, ~3-4 ms absolute), proving the middleware is viable for real-world deployments without slowing down AI responsiveness