NL2SQL: Natural Language to SQL—systems that translate human language questions into database queries
Schema Ambiguity: Situations where multiple tables or columns have similar names or semantic roles (e.g., 'sales' vs 'gross_sales'), making user intent unclear
Schema Masking: The technique of temporarily removing specific columns or tables from the schema provided to an LLM to force it to generate queries using different data sources
Conformal Prediction: A statistical framework used to construct prediction sets that contain the true label with a guaranteed probability (recall control)
Nucleus Sampling: A text decoding method that samples from the top probability tokens, often used to generate diverse text
SBERT: Sentence-BERT—a modification of the BERT network to derive semantically meaningful sentence embeddings that can be compared using cosine similarity
Schema Linking: The process of identifying which words in a natural language question correspond to which columns or tables in a database schema