auto-regressive: A generation process where the model predicts the next element in a sequence based on all previous elements
table-wide prompting: Prompting an LLM to generate an entire table or batch of rows in a single text output
cell-by-cell generation: Prompting an LLM to generate one specific cell value at a time, conditioned on previously generated cells in the row
conditional distribution: The probability distribution of a variable (e.g., Ethnicity) given the value of another variable (e.g., Age Group)
token bias: The tendency of LLMs to assign probabilities based on frequency in their training text corpus rather than the specific statistical context of the tabular task
statistical fidelity: How accurately the synthetic data reproduces the statistical properties (correlations, distributions) of the real-world data
GReaT: Generation of Realistic Tabular dataβa framework that fine-tunes LLMs to generate tabular data by treating rows as text sequences