← Back to Paper List

MM-Fi: Multi-Modal Non-Intrusive 4D Human Dataset for Versatile Wireless Sensing

Jianfei Yang, He Huang, Yunjiao Zhou, Xinyan Chen, Yuecong Xu, Shenghai Yuan, Han Zou, Chris Xiaoxuan Lu, Lihua Xie
Nanyang Technological University, School of Electrical and Electronic Engineering, University of Edinburgh, School of Informatics
arXiv (2023)
MM Benchmark

📝 Paper Summary

Human Pose Estimation (HPE) Wireless Sensing Multi-modal Learning
MM-Fi is a large-scale, synchronized multi-modal dataset comprising RGB, depth, LiDAR, mmWave radar, and WiFi CSI data for non-intrusive 4D human sensing and action recognition.
Core Problem
Existing human sensing solutions rely on intrusive cameras (privacy concerns, lighting sensitivity) or wearable sensors (inconvenient), while current wireless datasets lack multi-modal diversity and scale.
Why it matters:
  • Camera-based sensing compromises privacy in homes/hospitals and fails in poor lighting.
  • Wearable sensors require strict user compliance, which is impractical for long-term monitoring.
  • Existing wireless datasets typically support fewer than three modalities, hindering the development of robust multi-modal fusion algorithms for healthcare and metaverse applications.
Concrete Example: In a dark bedroom or bathroom, a camera-based system fails to detect a fall due to poor lighting and privacy restrictions, while a WiFi-only system might lack the spatial resolution for fine-grained pose estimation. Current datasets do not provide synchronized data to train a system that fuses both signals effectively.
Key Novelty
Five-Modality Non-Intrusive Sensing
  • Integrates five distinct synchronized modalities (RGB, Depth, LiDAR, mmWave, WiFi CSI) into a single dataset, bridging vision and wireless sensing.
  • Provides 4D (spatial-temporal) labels including 3D keypoints and action categories for 27 actions across 40 subjects.
  • Uses a custom mobile robot platform (ROS-based) to capture aligned data in diverse environments, overcoming synchronization challenges between high-rate (camera) and low-rate (WiFi/radar) sensors.
Architecture
Architecture Figure Figure 1
The mobile sensor platform and the five sensing modalities (RGB, Depth, LiDAR, mmWave, WiFi) capturing a human subject.
Evaluation Highlights
  • Dataset contains over 320k synchronized frames across 5 modalities from 40 human subjects.
  • Achieves high-quality ground truth annotations with a re-projection PCKh@0.5 of 95.66%.
  • Benchmarks demonstrate that fusing modalities (e.g., LiDAR + mmWave) significantly improves pose estimation accuracy compared to single wireless modalities.
Breakthrough Assessment
9/10
MM-Fi is the first dataset to synchronize five non-intrusive modalities (especially LiDAR, mmWave, and WiFi together), enabling new research in cross-modal supervision and robust wireless sensing.
×