GenoMAS

A Multi-Agent Framework for Scientific Discovery via
Code-Driven Gene Expression Analysis

Haoyang Liu1, Yijiang Li2, Haohan Wang1

1University of Illinois at Urbana-Champaign    2University of California, San Diego

Abstract

Gene expression analysis holds the key to many biomedical discoveries, yet extracting insights from raw transcriptomic data remains formidable due to the complexity of multiple large, semi-structured files and the need for extensive domain expertise. Current automation approaches are often limited by either inflexible workflows that break down in edge cases or by fully autonomous agents that lack the necessary precision for rigorous scientific inquiry.

GenoMAS charts a different course by presenting a team of LLM-based scientists that integrates the reliability of structured workflows with the adaptability of autonomous agents. GenoMAS orchestrates six specialized LLM agents through typed message-passing protocols, each contributing complementary strengths to a shared analytic canvas.

At the heart of GenoMAS lies a guided-planning framework: programming agents unfold high-level task guidelines into Action Units and, at each juncture, elect to advance, revise, bypass, or backtrack, thereby maintaining logical coherence while bending gracefully to the idiosyncrasies of genomic data.

89.13%
Composite Similarity Correlation
(Data Preprocessing)
60.48%
F₁ Score
(Gene Identification)
+10.61%
Improvement over
prior art (CSC)
+16.85%
Improvement over
prior art (F₁)

Key Contributions

🤖

Multi-Agent Collaboration

Six specialized LLM agents working as collaborative programmers, not just tool orchestrators, enabling end-to-end code generation for complex genomic analysis tasks.

🧭

Guided Planning Framework

Context-aware planning mechanism that encodes workflows as editable action units, balancing precise control with autonomous error handling and adaptive decision-making.

🎭

Heterogeneous LLM Architecture

Diverse ensemble of state-of-the-art LLMs (Claude Sonnet 4, OpenAI o3, Gemini 2.5 Pro) with complementary strengths in coding, reasoning, and scientific knowledge.

🔬

Scientific Rigor

Validated on the GenoTEX benchmark with real genomic datasets, demonstrating biologically plausible gene-phenotype associations corroborated by literature.

Method Overview

GenoMAS System Architecture

Multi-agent collaboration in GenoMAS. Six specialized agents coordinate through typed message-passing protocols.

Agent Architecture

🎯 Orchestration Agent

PI Agent: Coordinates the entire workflow, assigns tasks dynamically, and manages dependencies.

💻 Programming Agents

Data Engineers (GEO & TCGA): Handle platform-specific data preprocessing with specialized knowledge.

Statistician Agent: Conducts statistical analysis, regression modeling, and identifies trait-associated genes.

👥 Advisory Agents

Code Reviewer: Validates generated code for functionality and conformance.

Domain Expert: Provides biomedical insights for biological decisions.

Programming Agent Workflow

Programming Agent Architecture

Planning, memory, and self-correction mechanisms of programming agents.

📋 Action Units

Workflows decomposed into semantically coherent operations that can be executed atomically, revised, or reordered based on context.

🔄 Iterative Refinement

Three-stage code generation process: writing, review, and revision, with isolated context for independent assessment.

💾 Dynamic Memory

Validated code snippets stored for reuse, achieving ~65% reuse rate and substantial efficiency gains.

🧬 Domain Expertise

Consultation with Domain Expert for biomedical reasoning, gene identifier mapping, and clinical feature extraction.

Experimental Results

End-to-End Performance Comparison

Main Results Table

GenoMAS achieves state-of-the-art performance across all metrics, with substantial improvements in F₁ score (60.48%) and AUROC (0.81) while reducing API costs by 44.7%.

Component-wise Performance

Individual Task Performance

Performance breakdown across dataset filtering, selection, and preprocessing tasks, demonstrating consistent superiority over baselines.

Key Findings

  • 98.78% success rate in generating executable analysis pipelines
  • 89.13% CSC for data preprocessing, excelling in gene expression data handling (91.15%)
  • Heterogeneous LLMs provide +7.5% F₁ improvement over homogeneous configurations
  • Guided planning enables dynamic adaptation and efficient error recovery
  • Memory reuse stabilizes at ~65% rate, saving significant computational resources

Ablation Study Insights

  • Removing planning mechanism: -9.21% F₁
  • Excluding Domain Expert: -12.91% F₁
  • Single review round: -13.87% F₁
  • No reviewer: -35.50% F₁

Each component contributes significantly to overall system performance and scientific rigor.

Agent Collaboration Patterns

Agent Collaboration Patterns

Communication Analysis

Analysis of 2,500+ agent interactions reveals efficient task coordination:

  • 56.9% Data Engineer activity reflects central role in processing gene expression data
  • 2.3% PI messaging demonstrates high system autonomy (97.7% self-coordinated)
  • 634 Planning Requests validate the guided planning framework's importance
  • Low error messages (36 revisions) indicate effective error prevention

These patterns highlight the benefits of role specialization, cognitive diversity through heterogeneous LLMs, and distributed expertise in multi-agent systems.

Paper & Citation

Citation

@misc{liu2025genomasmultiagentframeworkscientific,
      title={GenoMAS: A Multi-Agent Framework for Scientific Discovery via Code-Driven Gene Expression Analysis}, 
      author={Haoyang Liu and Yijiang Li and Haohan Wang},
      year={2025},
      eprint={2507.21035},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2507.21035}, 
}

Team

Haoyang Liu

University of Illinois at Urbana-Champaign

hl57@illinois.edu

Yijiang Li

University of California, San Diego

yijiangli@ucsd.edu

Haohan Wang

University of Illinois at Urbana-Champaign

haohanw@illinois.edu

Acknowledgments

This research was supported by the National AI Research Resource (NAIRR) under grant number 240283.