Omni Parsing: Transforming unstructured signals into standardized knowledge that is Locatable, Enumerable, and Traceable
Evidence Anchoring: A mechanism ensuring high-level semantic descriptions are strictly aligned with and traceable to low-level facts (e.g., bounding boxes or timestamps)
Holistic Detection: Level 1 parsing task: achieving precise spatial-temporal grounding of objects or events to establish a geometric baseline
Fine-grained Recognition: Level 2 parsing task: performing symbolization (e.g., OCR, ASR) and attribute extraction on localized objects
Semantic Interpreting: Level 3 parsing task: constructing a reasoning chain from local semantics to global logic
SFT: Supervised Fine-Tuning—training a model on labeled examples to adapt it to specific tasks
VAD: Voice Activity Detection—identifying segments of audio that contain human speech
LID: Language Identification—automatically determining the language spoken in an audio clip