DP: Data Preprocessing—tasks like cleaning, integrating, and transforming raw data into a usable format
ED: Error Detection—identifying incorrect values in a dataset
DI: Data Imputation—filling in missing values in a dataset
SM: Schema Matching—identifying whether two database columns (attributes) refer to the same concept
EM: Entity Matching—identifying whether two records refer to the same real-world entity
CTA: Column Type Annotation—inferring the semantic type (e.g., 'city', 'price') of a table column
AVE: Attribute Value Extraction—extracting specific attribute values from unstructured text descriptions
Knowledge Injection: Adding explicit rules or domain constraints (e.g., 'treat N/A as non-match') into the prompt during training/inference
Instance Serialization: Converting structured table rows into a string format (e.g., 'col: val') for LLM input
LoRA: Low-Rank Adaptation—a parameter-efficient fine-tuning technique that freezes the main model weights and trains small adapter matrices
Prefix Caching: A vLLM optimization that reuses the computation of the common prompt prefix across a batch of requests to speed up inference