Genome modelling and design across all domains of life with Evo 2

Garyk Brixi(Palo Alto Institute), Matthew G. Durrant(Palo Alto Institute), Jerome Ku(Palo Alto Institute), Mohsen Naghipourfar(Palo Alto Institute), Michael Poli(BioQ Pharma (United States)), Gwanggyu Sun(Palo Alto Institute), Greg Brockman(Film Independent), Daniel Chang(Palo Alto Institute), Alison Fanton(Palo Alto Institute), Gabriel A. Gonzalez(Palo Alto Institute), Samuel H. King(Palo Alto Institute), David B. Li(Palo Alto Institute), Aditi Merchant(Palo Alto Institute), Eric Nguyen(Stanford University), Chiara Ricci-Tam(Palo Alto Institute), David W. Romero(Nvidia (United States)), Jonathan C. Schmok(Palo Alto Institute), Ali Taghibakhshi(Nvidia (United States)), A. B. Vorontsov(Nvidia (United States)), Brandon Yang(Film Independent), Myra Deng(Emodo (United States)), Liv Gorton(Emodo (United States)), Nam‐Ky Nguyen(Emodo (United States)), Nicholas K. Wang(Emodo (United States)), Michael T. Pearce(Emodo (United States)), Elana Simon(Emodo (United States)), Etowah Adams(Columbia University), Zachary Amador(University of Washington), Euan A. Ashley(Stanford University), Stephen A. Baccus(Stanford University), Haoyu Dai(Stanford University), Steven Dillmann(Stanford University), Stefano Ermon(Stanford University), Daniel Guo(Palo Alto Institute), Michael H. Herschl(Palo Alto Institute), Rajesh Ilango(Palo Alto Institute), Ken Janik(Nvidia (United States)), Amy X. Lu(University of California, Berkeley), Reshma Mehta(Palo Alto Institute), Mohammad R. K. Mofrad(University of California, Berkeley), Madelena Y. Ng(Stanford University), Jaspreet Pannu(Johns Hopkins University), Christopher Ré(Stanford University), John St. John(Nvidia (United States)), Jeremy Sullivan(Palo Alto Institute), Joseph Tey(Palo Alto Institute), Ben Viggiano(Stanford University), Kevin Zhu(University of California, Berkeley), Greg Zynda(Nvidia (United States)), Daniel Balsam(Emodo (United States)), Patrick Collison(Palo Alto Institute), Anthony B. Costa(Nvidia (United States)), Tina Hernandez-Boussard(Stanford University), Eric Ho(Emodo (United States)), Mingyu Liu(Nvidia (United States)), Thomas McGrath(Emodo (United States)), Kimberly Powell(Nvidia (United States)), Sudarshan Pinglay(University of Washington), Dave P. Burke(Palo Alto Institute), Hani Goodarzi(University of California, San Francisco), Patrick D. Hsu(Activated Research Company (United States)), Brian Hie(Stanford Health Care)
Nature
March 4, 2026
Cited by 38Open Access
Full Text

Abstract

All of life encodes information with DNA. Although tools for genome sequencing, synthesis and editing have transformed biological research, we still lack sufficient understanding of the immense complexity encoded by genomes to predict the effects of many classes of genomic changes or to intelligently compose new biological systems. Artificial intelligence models that learn information from genomic sequences across diverse organisms have increasingly advanced prediction and design capabilities1,2. Here we introduce Evo 2, a biological foundation model trained on 9 trillion DNA base pairs from a highly curated genomic atlas spanning all domains of life to have a 1 million token context window with single-nucleotide resolution. Evo 2 learns to accurately predict the functional impacts of genetic variation—from noncoding pathogenic mutations to clinically significant BRCA1 variants—without task-specific fine-tuning. Mechanistic interpretability analyses reveal that Evo 2 learns representations associated with biological features, including exon–intron boundaries, transcription factor binding sites, protein structural elements and prophage genomic regions. The generative abilities of Evo 2 produce mitochondrial, prokaryotic and eukaryotic sequences at genome scale with greater naturalness and coherence than previous methods. Evo 2 also generates experimentally validated chromatin accessibility patterns when guided by predictive models3,4 and inference-time search. We have made Evo 2 fully open, including model parameters, training code5, inference code and the OpenGenome2 dataset, to accelerate the exploration and design of biological complexity. Evo 2 is an artificial intelligence-based biological foundation model trained on 9 trillion DNA base pairs spanning all domains of life that predicts functional properties from genomic sequences and provides a rich generative model for researchers in biology.


Related Papers

No related papers found

Powered by citation graph analysis