ELISA: An Interpretable Hybrid Generative AI Agent for Expression-Grounded Discovery in Single-Cell Genomics

📝 Paper Summary

Agentic RAG pipeline Single-cell genomics analysis

ELISA integrates transcriptomic foundation model embeddings with semantic text retrieval via an adaptive routing mechanism to enable interpretable, natural-language discovery in single-cell genomics without retraining.

Core Problem

Agentic AI systems excel at text reasoning but lack access to transcriptomic data structure, while expression foundation models (like scGPT) capture cellular representations but are opaque to natural language queries.

Why it matters:

Translating statistical outputs (differential expression, pathways) into mechanistic biological hypotheses is currently labor-intensive and difficult to reproduce
Existing agents operate on curated text/databases and cannot query raw high-dimensional expression data
Expression foundation models are not designed for semantic querying, creating a disconnect between latent spaces and biological concepts

Concrete Example: When a researcher queries a gene signature like 'MARCO FABP4 APOC1...', text-aligned models like CellWhisperer fail (MRR ~0.40) because they optimize for natural language descriptions, ignoring the explicit expression signal that ELISA's gene-scoring pipeline captures (MRR ~0.81).

Key Novelty

Embedding-Linked Interactive Single-cell Agent (ELISA)

Unifies scGPT expression embeddings and BioBERT semantic embeddings in a shared representation without retraining, enabling dual-modality access
Employs an automatic query classifier to route inputs to the optimal retrieval pipeline: gene marker scoring for signatures, semantic similarity for concepts, or reciprocal rank fusion for mixed queries

Architecture

The complete ELISA framework showing how user queries are routed, retrieved, analyzed, and interpreted.

Evaluation Highlights

Significantly outperforms CellWhisperer in cell type retrieval (combined permutation test, p<0.001) across six datasets
Achieves massive gains on gene-signature queries (Cohen’s d = 5.98 for MRR) by leveraging dedicated expression scoring
Replicates published biological findings with high fidelity (mean composite score 0.90), including near-perfect pathway alignment (0.98)

Breakthrough Assessment

8/10

Bridging the gap between expression foundation models and semantic agents is a significant architectural advance for scientific AI. The retrieval gains on expression queries are drastic.

⚙️ Technical Details

Problem Definition

Setting: Retrieval and interpretation of cell types and biological signals from single-cell RNA sequencing (scRNA-seq) atlases using natural language or gene lists

Inputs: Natural language query or gene signature list

Outputs: Ranked cell clusters, analytical plots (pathways, interactions), and structured LLM-generated biological hypothesis

Pipeline Flow

Query Classifier (Routes to Gene, Ontology, or Mixed pipeline)
Retrieval Engine (Executes routed strategy on embeddings)
Analytical Modules (Computes biological statistics on retrieved clusters)
LLM Interpreter (Synthesizes results into hypotheses)

System Modules

Query Classifier

Determine if query is a gene signature, concept, or mixture

Model or implementation: Token-level heuristics (Rule-based)

Hybrid Retrieval Engine

Identify relevant cell clusters from the embedding file

Model or implementation: BioBERT (Semantic) + Weighted Scoring (Gene) + RRF (Fusion)

Analytical Suite

Compute biological metrics on retrieved data

Model or implementation: Statistical algorithms (non-AI)

LLM Interpreter

Synthesize statistical outputs into natural language hypotheses

Model or implementation: LLaMA-3.1-8B-Instant

Novel Architectural Elements

Hybrid routing mechanism that dynamically selects between gene-marker scoring and semantic embedding retrieval based on query content
Unified embedding file structure combining scGPT expression vectors and BioBERT semantic vectors without retraining the foundation models

Modeling

Base Model: LLaMA-3.1-8B-Instant (for reasoning), scGPT and BioBERT (for embeddings)

Comparison to Prior Work

vs. CellWhisperer: ELISA adds dedicated gene-marker scoring and analytical modules (pathways/interactions), whereas CellWhisperer relies on text-transcriptome alignment only
vs. scGPT: ELISA wraps scGPT embeddings in a semantic retrieval and reasoning layer, whereas scGPT is a raw representation model
vs. GeneGPT: ELISA operates on raw expression embeddings and analytical modules, while GeneGPT [not cited in paper] typically retrieves from textual databases

Limitations

Gene coverage is high (0.85) but not exhaustive, missing genes in rare cell states
Interaction recovery is limited (0.77) for mechanisms outside transcriptomic scope (e.g. trafficking)
Pathway-centric framework may miss ancestry-related transcriptional programs not captured in standard gene sets

Reproducibility

Code: https://github.com/omaruno/ELISA-An-AI-Agent-for-Expression-Grounded-Discovery-in-Single-Cell-Genomics

Code publicly available at https://github.com/omaruno/ELISA-An-AI-Agent-for-Expression-Grounded-Discovery-in-Single-Cell-Genomics.git. Uses publicly available scRNA-seq datasets from CZ CELLxGENE. LLaMA-3.1-8B-Instant accessed via Groq API.

📊 Experiments & Results

Evaluation Setup

Retrieval of cell types from 6 diverse scRNA-seq datasets and replication of biological findings

Benchmarks:

Retrieval Benchmark (Cell type retrieval (Ontology vs Expression queries)) [New]
Discovery Benchmark (Replication of published biological findings) [New]

Metrics:

Cluster Recall@k
Mean Reciprocal Rank (MRR)
Composite Discovery Score (Gene Coverage, Pathway Alignment, etc.)
Statistical methodology: Combined permutation test (50,000 permutations), paired t-tests, Cohen's d effect size

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
Retrieval performance shows ELISA dominating on expression-based queries while matching or exceeding baselines on ontology queries.
Average across 6 datasets	MRR	0.397	0.806	+0.409
Average across 6 datasets	Recall@5	Not reported in the paper	Not reported in the paper	+0.29
Average across 6 datasets	MRR	Not reported in the paper	Not reported in the paper	+0.15
Biological replication metrics demonstrate ELISA's ability to recover ground truth findings from raw data.
6 Reference Studies	Mean Composite Score	1.00	0.90	-0.10
6 Reference Studies	Pathway Alignment	1.00	0.98	-0.02

Experiment Figures

Radar plots comparing retrieval performance (Recall, MRR) of CellWhisperer vs ELISA variants across 6 datasets.

Main Takeaways

Hybrid routing is essential: No single modality dominates; semantic pipeline wins on ontology queries, gene pipeline wins on signature queries.
ELISA successfully bridges the gap between opaque scGPT embeddings and natural language reasoning without retraining foundation models.
The system is robust across diverse tissues (lung, brain, cancer) and experimental designs (developmental, case-control).

📚 Prerequisite Knowledge

Prerequisites

Single-cell RNA sequencing (scRNA-seq) analysis workflows
Retrieval-Augmented Generation (RAG)
Foundation models (scGPT, BERT, LLaMA)

Key Terms

scRNA-seq: Single-cell RNA sequencing—a technology that measures gene expression levels for individual cells

scGPT: A generative pre-trained foundation model for single-cell biology that learns latent representations of cells

DE: Differential Expression—statistical analysis to find genes that distinguish one cell group from others

RRF: Reciprocal Rank Fusion—an algorithm to combine ranked lists from multiple retrieval systems

MRR: Mean Reciprocal Rank—a statistic measuring the quality of a ranked list of search results

BioBERT: A pre-trained language model optimized for biomedical text mining

Cohen's d: A measure of effect size indicating the standardized difference between two means

AnnData: Annotated Data—a standard file format for storing single-cell genomics data