Complete vertebrate mitogenomes reveal widespread repeats and gene duplications

Giulio Formenti(Howard Hughes Medical Institute), Arang Rhie(National Institutes of Health), Jennifer Balacco(Rockefeller University), Bettina Haase(Rockefeller University), Jacquelyn Mountcastle(Rockefeller University), Olivier Fédrigo(Rockefeller University), Samara Brown(Howard Hughes Medical Institute), Marco Rosario Capodiferro(University of Pavia), Farooq O. Al-Ajli(Monash University Malaysia), Roberto Ambrosini(University of Milan), Peter Houde(New Mexico State University), Sergey Koren(National Institutes of Health), Karen Oliver(Wellcome Sanger Institute), Michelle Smith(Wellcome Sanger Institute), Jason Skelton(Wellcome Sanger Institute), Emma Betteridge(Wellcome Sanger Institute), Jale Dolucan(Wellcome Sanger Institute), Craig Corton(Wellcome Sanger Institute), Iliana Bista(University of Cambridge), James Torrance(Wellcome Sanger Institute), Alan Tracey(Wellcome Sanger Institute), Jonathan Wood(Wellcome Sanger Institute), Marcela Uliano‐Silva(Wellcome Sanger Institute), Kerstin Howe(Wellcome Sanger Institute), Shane McCarthy(University of Cambridge), Sylke Winkler(Max Planck Institute of Molecular Cell Biology and Genetics), Woori Kwak, Jonas Korlach(Pacific Biosciences (United States)), Arkarachai Fungtammasan(DNAnexus (United States)), Daniel Fordham(Oxford Nanopore Technologies (United Kingdom)), Vânia Costa(Oxford Nanopore Technologies (United Kingdom)), Simon Mayes(Oxford Nanopore Technologies (United Kingdom)), Matteo Chiara(University of Milan), David S. Horner(University of Milan), Eugene W. Myers(Max Planck Institute of Molecular Cell Biology and Genetics), Richard Durbin(University of Cambridge), Alessandro Achilli(University of Pavia), Edward L. Braun(University of Florida), Adam M. Phillippy(National Institutes of Health), Erich D. Jarvis(Howard Hughes Medical Institute)
Genome biology
April 28, 2021
Cited by 144Open Access
Full Text

Abstract

BACKGROUND: Modern sequencing technologies should make the assembly of the relatively small mitochondrial genomes an easy undertaking. However, few tools exist that address mitochondrial assembly directly. RESULTS: As part of the Vertebrate Genomes Project (VGP) we develop mitoVGP, a fully automated pipeline for similarity-based identification of mitochondrial reads and de novo assembly of mitochondrial genomes that incorporates both long (> 10 kbp, PacBio or Nanopore) and short (100-300 bp, Illumina) reads. Our pipeline leads to successful complete mitogenome assemblies of 100 vertebrate species of the VGP. We observe that tissue type and library size selection have considerable impact on mitogenome sequencing and assembly. Comparing our assemblies to purportedly complete reference mitogenomes based on short-read sequencing, we identify errors, missing sequences, and incomplete genes in those references, particularly in repetitive regions. Our assemblies also identify novel gene region duplications. The presence of repeats and duplications in over half of the species herein assembled indicates that their occurrence is a principle of mitochondrial structure rather than an exception, shedding new light on mitochondrial genome evolution and organization. CONCLUSIONS: Our results indicate that even in the "simple" case of vertebrate mitogenomes the completeness of many currently available reference sequences can be further improved, and caution should be exercised before claiming the complete assembly of a mitogenome, particularly from short reads alone.


Related Papers

No related papers found

Powered by citation graph analysis