VLM: Vision-Language Model—an AI model capable of processing both images/video and text to understand visual content
LLM: Large Language Model—an AI model trained on vast text data to generate and understand human language
Schema: A structured template defining what types of entities (e.g., 'Ingredient') and attributes (e.g., 'Quantity') to extract for a specific category
Canonicalization: The process of normalizing diverse raw category names into a standardized, duplicate-free list
Agentic Framework: A system where AI models act as autonomous agents, planning steps (like first defining a schema, then using it) to achieve a complex goal
NER: Named Entity Recognition—identifying specific items like names, dates, and locations in text or speech
OCR: Optical Character Recognition—converting text shown visually in images/video frames into machine-readable text