M3D-Data: The large-scale 3D medical dataset proposed in this paper, containing 120K image-text pairs and 662K instruction-response pairs
M3D-LaMed: The proposed 3D multi-modal large language model designed to process 3D medical images directly
M3D-Bench: The proposed benchmark suite for evaluating 3D medical MLLMs across 8 different tasks
3D Spatial Pooling Perceiver: A module that reduces the number of visual tokens from the 3D encoder via 3D pooling before feeding them to the LLM
SegVol: A promptable 3D segmentation model used as the segmentation module in the M3D-LaMed architecture
M3D-Cap: The image-text pair subset of M3D-Data, used for pre-training
M3D-Seg: The segmentation subset of M3D-Data, compiled from public datasets
Referring Expression Segmentation: A task where the model segments a specific region in an image based on a natural language description