Agentic Feature Augmentation: Unifying Selection and Generation with Teaming, Planning, and Memories

📝 Paper Summary

Agentic Memory Multi-agent collaboration Automated Feature Engineering

MAGS unifies feature selection and generation into a collaborative multi-agent system where a router plans optimization paths and agents improve via memory-augmented in-context learning.

Core Problem

Existing feature engineering methods perform selection and generation separately, failing to balance redundancy reduction with the creation of meaningful new dimensions.

Why it matters:

Feature selection alone risks losing hidden interactions needed for predictive models by only filtering existing features
Feature generation alone introduces redundancy and suboptimal dimensions without pruning
Separate application of these techniques misses synergistic interactions, leading to suboptimal data representations in domains like predictive maintenance

Concrete Example: In predictive maintenance, simply selecting sensor signals (vibration, temperature) misses complex health indicators (failure probability), while generating indicators without selection creates a bloated, noisy feature set.

Key Novelty

Multi-Agent System with Long and Short-Term Memory (MAGS)

Models feature engineering as a teaming problem where a Router agent dynamically switches between a Selector (to prune) and a Generator (to expand) based on the current state
treats feature sets as token sequences (postfix expressions) allowing LLM agents to manipulate them as language generation tasks
Uses a dual-memory mechanism: Short-term memory for immediate trajectory refinement within an iteration, and Long-term memory to retrieve high-quality historical demonstrations

Architecture

The three technical components of MAGS: the agentic teaming framework, the dual memory mechanism, and the offline RL module.

Breakthrough Assessment

7/10

Novel framing of feature engineering as an agentic planning problem with distinct router/selector/generator roles. The dual-memory integration is logically sound. Score limited by lack of visible quantitative results in the provided text.

⚙️ Technical Details

Problem Definition

Setting: Iterative feature set optimization via a multi-agent system

Inputs: Original feature set F_0

Outputs: Optimized feature set F* maximizing a task-specific scoring function S(.)

Pipeline Flow

Router Agent (decides action type)
Execution Agent (Selector or Generator based on Router)
Scoring Environment (Evaluates new feature set)

System Modules

Router Agent

Analyzes input data state to decide whether to trigger feature selection or generation

Model or implementation: LLM (Policy Network)

Generator Agent (Execution)

Creates new features by crossing existing ones using mathematical operators

Model or implementation: LLM (In-context learning)

Selector Agent (Execution)

Identifies and removes redundant features to maintain compactness

Model or implementation: LLM (In-context learning)

Novel Architectural Elements

Router-driven iterative switching between generation and selection agents
Representation of feature sets as postfix token sequences to enable LLM-based manipulation
Dual-memory architecture integrating local trajectory feedback (short-term) with global historical bests (long-term)

Modeling

Base Model: Commercial LLM APIs (specific model name not reported in text)

Training Method: Offline Proximal Policy Optimization (PPO)

Objective Functions:

Purpose: Optimize the Router's policy to maximize expected downstream task performance.

Formally: PPO update maximizing expected reward (score) while penalizing deviation from behavior policy.

Training Data:

Triplets of (prompt, answer, score) collected from offline exploration
Prompt encodes environment state (statistics)
Answer is the routing decision
Score is downstream task performance

Compute: Not reported in the paper

Comparison to Prior Work

vs. Traditional Feature Engineering: Unifies selection and generation in a single iterative loop guided by a Router
vs. Standard AutoML: Uses LLM agents with memory for reasoning rather than just search algorithms
vs. DIFER [not cited in paper]: Uses multi-agent collaboration and routing rather than evolutionary algorithms for feature construction

Limitations

Quantitative results (performance metrics) are not available in the provided text
Specifics of the 'Commercial LLM APIs' used are not detailed in the provided text
Computational cost of iterative LLM calls for feature engineering may be high

Reproducibility

No code URL or specific model weights provided in the text. Operator sets and memory mechanisms are described conceptually.

📊 Experiments & Results

Evaluation Setup

Iterative feature augmentation evaluated by downstream task performance

Metrics:

Downstream task performance (Score S)
Statistical methodology: Not explicitly reported in the paper

Main Takeaways

The paper proposes a unified framework (MAGS) that combines feature selection and generation using agentic teaming.
The method employs a Router agent trained via offline PPO to intelligently switch between adding and removing features.
Dual memory mechanisms allow agents to learn from both immediate feedback (short-term) and historical best practices (long-term).
Note: Quantitative experimental results (tables, specific improvement metrics) were not included in the provided text, so specific numeric performance claims cannot be verified.

📚 Prerequisite Knowledge

Prerequisites

Feature Engineering (Selection and Generation)
Reinforcement Learning (PPO)
In-context Learning with LLMs

Key Terms

PPO: Proximal Policy Optimization—a reinforcement learning algorithm used here to fine-tune the Router agent's decision-making policy offline

Postfix expression: A mathematical notation (e.g., 'a b +') used to represent feature transformations as token sequences for the LLM

In-context learning: Providing examples within the LLM's prompt to guide its behavior without updating its weights

Short-term Memory: Agent-specific action sequences and feedback within the current exploration iteration

Long-term Memory: A repository of high-quality augmented feature sets from historical runs, sampled randomly to guide global optimization

Tokenization: Representing a set of features and operations as a sequence of tokens for processing by language models