ISLR: Isolated Sign Language Recognition—classifying a single sign video into a word/gloss, distinct from continuous sign language translation.
Gloss: A written label (usually an English word) that represents a specific sign.
RGB-D: Red, Green, Blue, and Depth—video data that includes color and distance information for each pixel.
ST-GCN: Spatio-Temporal Graph Convolutional Network—a deep learning architecture that models the human skeleton as a graph moving over time.
MST-Net: Multi-scale Spatial Temporal Network—a state-of-the-art method for sign language recognition.
Kinect-V2: A depth camera using Time-of-Flight technology, offering high resolution.
RealSense: A depth camera using stereo vision technology, offering higher frame rates.
In-the-Wild (ITW): Test data where green screen backgrounds are replaced with real-world dynamic/static scenes to simulate natural environments.