Towards complete and error-free genome assemblies of all vertebrate species

Arang Rhie(National Institutes of Health), Shane McCarthy(University of Cambridge), Olivier Fédrigo(Rockefeller University), Joana Damas(University of California, Davis), Giulio Formenti(Rockefeller University), Sergey Koren(National Institutes of Health), Marcela Uliano‐Silva(Berlin Center for Genomics in Biodiversity Research), William Chow(Wellcome Sanger Institute), Arkarachai Fungtammasan(DNAnexus (United States)), Gregory Gedman(University of California, Davis), Lindsey Cantin(University of California, Davis), Françoise Thibaud‐Nissen(National Center for Biotechnology Information), Leanne Haggerty(European Bioinformatics Institute), Chul Lee(Seoul National University), Byung June Ko(Seoul National University), Juwan Kim(Seoul National University), Iliana Bista(University of Cambridge), Michelle Smith(Wellcome Sanger Institute), Bettina Haase(Rockefeller University), Jacquelyn Mountcastle(Rockefeller University), Sylke Winkler(Center for Systems Biology Dresden), Sadye Paez(Rockefeller University), Jason T. Howard(Rockefeller University), Sonja C. Vernes(Radboud University Nijmegen), Tanya M. Lama(University of Massachusetts Amherst), Frank Grützner(The University of Adelaide), Wesley C. Warren(University of Missouri), Christopher N. Balakrishnan(East Carolina University), David W. Burt(The University of Queensland), Julia M. George(Queen Mary University of London), Mathew Biegler(Rockefeller University), David Iorns, Andrew Digby, Daryl Eason, Taylor Edwards(University of Arizona), Mark Wilkinson(Natural History Museum), George F. Turner(Bangor University), Axel Meyer(University of Konstanz), Andreas F. Kautt(Northeastern University), Paolo Franchini(University of Konstanz), H. William Detrich(Northeastern University), Hannes Svardal(Naturalis Biodiversity Center), Maximilian Wagner(University of Graz), Gavin J. P. Naylor(Florida Museum of Natural History), Martin Pippel(Center for Systems Biology Dresden), Milan Malinsky(University of Basel), Mark P. Mooney(Technology Affinity Group), Maria Simbirsky(DNAnexus (United States)), Brett T. Hannigan(DNAnexus (United States)), Trevor Pesout(University of California, Santa Cruz), Marlys L. Houck(Zoological Society of San Diego), Ann Misuraca(Pacific Biosciences (United States)), Sarah B. Kingan(Pacific Biosciences (United States)), Richard Hall(Pacific Biosciences (United States)), Zev Kronenberg(Pacific Biosciences (United States)), Jonas Korlach(Pacific Biosciences (United States)), Ivan Sović(Pacific Biosciences (United States)), Christopher Dunn(Pacific Biosciences (United States)), Zemin Ning(Wellcome Sanger Institute), Alex Hastie(BioNano Genomics (United States)), Joyce Lee(BioNano Genomics (United States)), Siddarth Selvaraj(Arima Genomics (United States)), Richard E. Green(University of California, Santa Cruz), Nicholas H. Putnam, Jay Ghurye(Dovetail Genomics (United States)), Erik Garrison(University of California, Santa Cruz), Ying Sims(Wellcome Sanger Institute), Joanna Collins(Wellcome Sanger Institute), Sarah Pelan(Wellcome Sanger Institute), James Torrance(Wellcome Sanger Institute), Alan Tracey(Wellcome Sanger Institute), Jonathan Wood(Wellcome Sanger Institute), Dengfeng Guan(Harbin Institute of Technology), Sarah E. London(University of Chicago), David F. Clayton(Queen Mary University of London), Claudio V. Mello(Oregon Health & Science University), Samantha R. Friedrich(Oregon Health & Science University), Peter V. Lovell(Oregon Health & Science University), Ekaterina Osipova(Max Planck Institute for the Physics of Complex Systems), Farooq O. Al-Ajli(Monash University Malaysia), Simona Secomandi(University of Milan), Heebal Kim(Seoul National University), Constantina Theofanopoulou(Rockefeller University), Yang Zhou(University of Copenhagen), Robert S. Harris(Pennsylvania State University), Kateryna D. Makova(Pennsylvania State University), Paul Medvedev(Pennsylvania State University), Jinna Hoffman(National Center for Biotechnology Information), Patrick Masterson(National Center for Biotechnology Information), Karen Clark(National Center for Biotechnology Information), Fergal J. Martin(European Bioinformatics Institute), Kevin Howe(European Bioinformatics Institute), Paul Flicek(European Bioinformatics Institute), Brian P. Walenz(National Institutes of Health), Woori Kwak(CrystalGenomics (South Korea)), Hiram Clawson(University of California, Santa Cruz), Mark Diekhans(University of California, Santa Cruz), Luis R Nassar(University of California, Santa Cruz), Benedict Paten(University of California, Santa Cruz), R.H. Kraus(University of Konstanz), Harris A. Lewin(John Muir Health), Andrew J. Crawford(Universidad de Los Andes), M. Thomas P. Gilbert(University of Copenhagen), Guojie Zhang(University of Copenhagen), Byrappa Venkatesh(Agency for Science, Technology and Research), Robert W. Murphy(Royal Ontario Museum), Klaus‐Peter Koepfli(Smithsonian Conservation Biology Institute), Beth Shapiro(Howard Hughes Medical Institute), Warren E. Johnson(Smithsonian Institution), Federica Di Palma(University of East Anglia), Tomas Margues-Bonet(Universitat Pompeu Fabra), Emma C. Teeling(University College Dublin), Tandy Warnow(University of Illinois Urbana-Champaign), Jennifer A. Marshall Graves(La Trobe University), Oliver A. Ryder(Zoological Society of San Diego), David Hausler(Howard Hughes Medical Institute), Stephen J. O’Brien(ITMO University), Kerstin Howe(Wellcome Sanger Institute), Eugene W. Myers(University of Basel), Richard Durbin(University of Cambridge), Adam M. Phillippy(National Institutes of Health), Erich D. Jarvis(Howard Hughes Medical Institute)
bioRxiv (Cold Spring Harbor Laboratory)
May 23, 2020
Cited by 195Open Access
Full Text

Abstract

Abstract High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are only available for a few non-microbial species 1–4 . To address this issue, the international Genome 10K (G10K) consortium 5,6 has worked over a five-year period to evaluate and develop cost-effective methods for assembling the most accurate and complete reference genomes to date. Here we summarize these developments, introduce a set of quality standards, and present lessons learned from sequencing and assembling 16 species representing major vertebrate lineages (mammals, birds, reptiles, amphibians, teleost fishes and cartilaginous fishes). We confirm that long-read sequencing technologies are essential for maximizing genome quality and that unresolved complex repeats and haplotype heterozygosity are major sources of error in assemblies. Our new assemblies identify and correct substantial errors in some of the best historical reference genomes. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an effort to generate high-quality, complete reference genomes for all ~70,000 extant vertebrate species and help enable a new era of discovery across the life sciences.


Related Papers

No related papers found

Powered by citation graph analysis