TabPFN: Tabular Prior-Data Fitted Network—a transformer pretrained on synthetic data to perform classification on new tabular datasets in a single forward pass (zero-shot)
ICL: In-Context Learning—the ability of a model to learn from a small set of examples (support set) provided in the input prompt without weight updates
Zero-shot: Evaluating the model on a new task using only the support set in the forward pass, without updating the model's weights
Fine-tuning: Updating the model's weights on the support set of the new task using gradient descent before inference
Decision Boundary: The surface in the feature space that separates samples belonging to different classes
Simplicity Bias: The tendency of neural networks to learn simple (e.g., linear) functions even when the data requires complex functions
Forest Dataset Generator: A new method proposed in this paper that creates synthetic datasets by fitting decision trees to random noise, ensuring complex decision boundaries
SCM: Structural Causal Model—a method used by the original TabPFN to generate realistic synthetic data with causal relationships