Evaluation Setup
Encrypted Traffic Classification on standard benchmarks
Benchmarks:
- Specific dataset names not listed in text (Traffic Classification)
Metrics:
- Accuracy (Frozen Encoder)
- Accuracy (Full Fine-tuning)
- Statistical methodology: Not explicitly reported in the paper
Key Results
| Benchmark |
Metric |
Baseline |
This Paper |
Δ |
| The paper highlights a critical failure in prior work under frozen encoder evaluation. |
| Standard ETC Datasets |
Accuracy (Frozen Encoder) |
90.0 |
47.0 |
-43.0
|
| FlowSem-MAE performance claims (specific numbers for FlowSem-MAE vs baselines are described qualitatively in abstract/intro text provided). |
| Standard ETC Datasets |
Accuracy |
Not reported in the paper |
Not reported in the paper |
Not reported in the paper
|
Main Takeaways
- Byte-level pretraining provides minimal benefit over random initialization for feature extraction (frozen encoder), relying almost entirely on supervision during fine-tuning.
- Protocol-native modeling (FlowSem-MAE) successfully learns transferable representations that work well even without full fine-tuning.
- Filtering unpredictable fields (P1) and separating embeddings (P2) are critical for reducing gradient noise and semantic confusion.
- Dual-axis attention is necessary to capture the inherent 2D structure (Time x Fields) of traffic flows.