← Back to Paper List

GenZ: Foundational models as latent variable generators within traditional statistical models

Marko Jojic, Nebojsa Jojic
Arizona State University, Microsoft Research
arXiv (2025)
Recommendation MM

📝 Paper Summary

Neuro-symbolic AI Concept Bottleneck Models Hybrid Statistical Models
GenZ discovers interpretable binary features by iteratively prompting a frozen foundational model to explain the semantic difference between items where a statistical model makes large versus small prediction errors.
Core Problem
Foundational models possess general domain knowledge but often fail to capture dataset-specific statistical patterns (like local housing market dynamics) needed for accurate prediction.
Why it matters:
  • Pure statistical models capture dataset correlations but lack semantic interpretability regarding why predictions are made
  • Standard Concept Bottleneck Models rely on LLMs (Large Language Models) to propose features a priori, which fails when the LLM's training distribution diverges from specific dataset statistics
  • Directly asking LLMs to predict high-dimensional real-valued targets is ineffective because the error structure is difficult to describe in a text prompt
Concrete Example: In house price prediction, an LLM might generally know that 'size' matters, but fail to identify that 'architectural details' specifically predict prices in a local market, leading to high error (38%) compared to a model that learns these specific features from data.
Key Novelty
Error-Driven Semantic Feature Discovery
  • Instead of asking the LLM 'what features are important?', GenZ identifies groups of items where the statistical model currently fails (high residuals) vs. succeeds.
  • It prompts the LLM to find a semantic distinction (a binary 'concept') that separates these two groups, effectively translating statistical errors into interpretable text features.
  • Uses a Generalized EM (Expectation-Maximization) algorithm to jointly optimize the binary feature definitions (prompts) and the statistical mapping from features to targets.
Evaluation Highlights
  • Achieves 12% median relative error on house price prediction, significantly outperforming a GPT-5 baseline which yields 38% error using general domain knowledge.
  • Predicts Netflix movie embeddings with 0.59 cosine similarity using only discovered semantic features, matching the performance of traditional collaborative filtering with ~4000 user ratings.
  • Discovers interpretable features (e.g., 'historical war film') that act as latent variables to explain complex high-dimensional observation data.
Breakthrough Assessment
8/10
Offers a novel way to align LLM knowledge with dataset-specific statistics without gradient-based fine-tuning. The method of using statistical posteriors to drive prompt discovery is a significant methodological advance for neuro-symbolic integration.
×