Evaluation Setup
Retrieval of cell types from 6 diverse scRNA-seq datasets and replication of biological findings
Benchmarks:
- Retrieval Benchmark (Cell type retrieval (Ontology vs Expression queries)) [New]
- Discovery Benchmark (Replication of published biological findings) [New]
Metrics:
- Cluster Recall@k
- Mean Reciprocal Rank (MRR)
- Composite Discovery Score (Gene Coverage, Pathway Alignment, etc.)
- Statistical methodology: Combined permutation test (50,000 permutations), paired t-tests, Cohen's d effect size
Key Results
| Benchmark |
Metric |
Baseline |
This Paper |
Δ |
| Retrieval performance shows ELISA dominating on expression-based queries while matching or exceeding baselines on ontology queries. |
| Average across 6 datasets |
MRR |
0.397 |
0.806 |
+0.409
|
| Average across 6 datasets |
Recall@5 |
Not reported in the paper |
Not reported in the paper |
+0.29
|
| Average across 6 datasets |
MRR |
Not reported in the paper |
Not reported in the paper |
+0.15
|
| Biological replication metrics demonstrate ELISA's ability to recover ground truth findings from raw data. |
| 6 Reference Studies |
Mean Composite Score |
1.00 |
0.90 |
-0.10
|
| 6 Reference Studies |
Pathway Alignment |
1.00 |
0.98 |
-0.02
|
Main Takeaways
- Hybrid routing is essential: No single modality dominates; semantic pipeline wins on ontology queries, gene pipeline wins on signature queries.
- ELISA successfully bridges the gap between opaque scGPT embeddings and natural language reasoning without retraining foundation models.
- The system is robust across diverse tissues (lung, brain, cancer) and experimental designs (developmental, case-control).