LeaPR: Learned Programmatic Representations—models combining LLM-written feature functions with decision tree predictors
D-ID3: Dynamic ID3—a modified decision tree algorithm where an LLM generates new features on-the-fly to split specific leaf nodes based on misclassified examples
F2: Features FunSearch—an evolutionary-style algorithm where an LLM iteratively proposes batches of features to maximize their importance scores in a Random Forest
FunSearch: A method to search for functions in code space using an LLM and an evaluator, originally used for solving mathematical problems
ID3: Iterative Dichotomiser 3—a classical algorithm used to generate a decision tree from a dataset
SHAP values: SHapley Additive exPlanations—a game theoretic approach to explain the output of any machine learning model by attributing importance to input features
RMSE: Root Mean Square Error—a standard metric for regression tasks measuring the average magnitude of the error
programmatic features: Input features defined as executable code (Python functions) rather than static values or learned neural weights
impurity: A measure (like entropy or variance) used in decision trees to quantify how 'mixed' the labels are at a specific node