Tabby: The proposed architecture modification replacing standard transformer layers with column-specific Mixture-of-Experts
Plain: The proposed training technique using simple ordered serialization without permutations, contrary to GReaT
Mixture-of-Experts (MoE): A neural network architecture where different sub-networks (experts) are activated for different inputs; here, experts are assigned to specific table columns
Machine Learning Efficacy (MLE): A metric evaluating synthetic data quality by training a classifier/regressor on synthetic data and testing it on real real test data
GReaT: Prior SOTA LLM tabular method that permutes column orders during training to learn conditional distributions
Tab-DDPM: A diffusion-based tabular synthesis model
Distilled-GPT2: A smaller, distilled version of the GPT-2 language model used as the base for most experiments
Llama-3-8B: A large open-weights language model used for comparison to show Tabby's efficiency
LoRA: Low-Rank Adaptation—a parameter-efficient fine-tuning technique
GTT: GReaT combined with TapTap (pretraining) and Tabula (encoding), a strong LLM baseline
EOC: End-of-Column token introduced by the authors to delimit feature values in the serialization