HLER: Human-in-the-Loop Economic Research via Multi-Agent Pipelines for Empirical Discovery

📝 Paper Summary

Multi-agent systems for scientific discovery Automated empirical research

HLER is a multi-agent pipeline that automates empirical economics research by constraining hypothesis generation to actual dataset properties and integrating human oversight for question selection and final approval.

Core Problem

Existing AI research agents often hallucinate infeasible hypotheses and struggle with the specific procedural rigor of empirical economics, such as identification strategies and data constraints.

Why it matters:

Unconstrained LLMs frequently propose research questions that require variables not present in the dataset, leading to wasted compute and dead ends.
Credible social science requires careful identification strategies and human judgment on economic significance, which fully autonomous 'AI Scientists' often lack.
Reproducibility in economics is fragile; automating the workflow with transparent code generation can help standardized evidence generation.

Concrete Example: An unconstrained LLM might propose studying the 'impact of remote work on rural wages' using a dataset that contains no remote work variable. HLER avoids this by first auditing the dataset schema and only generating questions compatible with available variables.

Key Novelty

Dataset-Aware Human-in-the-Loop Economic Research

Implements a 'dataset-aware' hypothesis generation mechanism that conditions LLM brainstorming on a structured audit of the dataset's variables and statistical distributions.
Uses a dual-loop architecture: a 'Question Quality Loop' for human selection of hypotheses, and a 'Research Revision Loop' where an automated reviewer agent iteratively critiques and requests re-analysis.

Evaluation Highlights

Dataset-aware generation produced feasible research questions in 87% of cases, compared to only 41% for unconstrained LLM ideation.
Reduced the rate of infeasible/hallucinated hypotheses from 59% (unconstrained) to 13% (dataset-aware).
Successfully produced complete end-to-end empirical manuscripts across 14 runs at an average API cost of $0.8-$1.5 per paper.

Breakthrough Assessment

7/10

Significant practical step in constraining AI scientific discovery to reality. While not a new fundamental algorithm, the architectural integration of data auditing and human gates solves a major pain point in applied AI research.

⚙️ Technical Details

Problem Definition

Setting: Automating the end-to-end empirical research workflow (from data to manuscript) for economics while maintaining human oversight.

Inputs: Raw dataset (e.g., survey data) and optional user research interests.

Outputs: Full research manuscript (PDF/Markdown) including statistical tables and identification strategy.

Pipeline Flow

Data Preparation: DataAuditAgent → DataProfilingAgent
Ideation: QuestionAgent (with Human Selection Gate)
Execution: Data Collection → EconometricsAgent → PaperAgent
Review Loop: ReviewerAgent → (Re-analysis + Revision) if needed → Publication Gate

System Modules

DataAuditAgent (Data Preparation)

Validates dataset structure and creates a variable inventory to prevent hallucinated inputs.