Towards complete and error-free genome assemblies of all vertebrate speciesAbstract High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are available for only a few non-microbial species 1–4 . To address this issue, the international Genome 10K (G10K) consortium 5,6 has worked over a five-year period to evaluate and develop cost-effective methods for assembling highly accurate and nearly complete reference genomes. Here we present lessons learned from generating assemblies for 16 species that represent six major vertebrate lineages. We confirm that long-read sequencing technologies are essential for maximizing genome quality, and that unresolved complex repeats and haplotype heterozygosity are major sources of assembly error when not handled correctly. Our assemblies correct substantial errors, add missing sequence in some of the best historical reference genomes, and reveal biological discoveries. These include the identification of many false gene duplications, increases in gene sizes, chromosome rearrangements that are specific to lineages, a repeated independent chromosome breakpoint in bat genomes, and a canonical GC-rich pattern in protein-coding genes and their regulatory regions. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an international effort to generate high-quality, complete reference genomes for all of the roughly 70,000 extant vertebrate species and to help to enable a new era of discovery across the life sciences.
The axolotl genome and the evolution of key tissue formation regulatorsAbstract Salamanders serve as important tetrapod models for developmental, regeneration and evolutionary studies. An extensive molecular toolkit makes the Mexican axolotl ( Ambystoma mexicanum ) a key representative salamander for molecular investigations. Here we report the sequencing and assembly of the 32-gigabase-pair axolotl genome using an approach that combined long-read sequencing, optical mapping and development of a new genome assembler (MARVEL). We observed a size expansion of introns and intergenic regions, largely attributable to multiplication of long terminal repeat retroelements. We provide evidence that intron size in developmental genes is under constraint and that species-restricted genes may contribute to limb regeneration. The axolotl genome assembly does not contain the essential developmental gene Pax3 . However, mutation of the axolotl Pax3 paralogue Pax7 resulted in an axolotl phenotype that was similar to those seen in Pax3 −/− and Pax7 −/− mutant mice. The axolotl genome provides a rich biological resource for developmental and evolutionary studies.
Six reference-quality genomes reveal evolution of bat adaptationsAbstract Bats possess extraordinary adaptations, including flight, echolocation, extreme longevity and unique immunity. High-quality genomes are crucial for understanding the molecular basis and evolution of these traits. Here we incorporated long-read sequencing and state-of-the-art scaffolding protocols 1 to generate, to our knowledge, the first reference-quality genomes of six bat species ( Rhinolophus ferrumequinum , Rousettus aegyptiacus , Phyllostomus discolor , Myotis myotis , Pipistrellus kuhlii and Molossus molossus ). We integrated gene projections from our ‘Tool to infer Orthologs from Genome Alignments’ (TOGA) software with de novo and homology gene predictions as well as short- and long-read transcriptomics to generate highly complete gene annotations. To resolve the phylogenetic position of bats within Laurasiatheria, we applied several phylogenetic methods to comprehensive sets of orthologous protein-coding and noncoding regions of the genome, and identified a basal origin for bats within Scrotifera. Our genome-wide screens revealed positive selection on hearing-related genes in the ancestral branch of bats, which is indicative of laryngeal echolocation being an ancestral trait in this clade. We found selection and loss of immunity-related genes (including pro-inflammatory NF-κB regulators) and expansions of anti-viral APOBEC3 genes, which highlights molecular mechanisms that may contribute to the exceptional immunity of bats. Genomic integrations of diverse viruses provide a genomic record of historical tolerance to viral infection in bats. Finally, we found and experimentally validated bat-specific variation in microRNAs, which may regulate bat-specific gene-expression programs. Our reference-quality bat genomes provide the resources required to uncover and validate the genomic basis of adaptations of bats, and stimulate new avenues of research that are directly relevant to human health and disease 1 .
Six3 overexpression initiates the formation of ectopic retinaThe homeobox gene sine oculis (so) is essential for visual system formation in Drosophila. A vertebrate member of the so/Six gene family, Six3, is expressed in the developing eye and forebrain. Injection of Six3 RNA into medaka fish embryos causes ectopic Pax6 and Rx2 expression in midbrain and cerebellum, resulting in the formation of ectopic retinal primordia. Injected mouse Six3 RNA initiates ectopic expression of endogenous medaka Six3, uncovering a feedback control of Six3 expression. Initiation of ectopic retina formation reveals a pivotal role for Six3 in vertebrate retina development and hints at a conserved regulatory network underlying vertebrate and invertebrate eye development.
The genome of Schmidtea mediterranea and the evolution of core cellular mechanismsThe planarian Schmidtea mediterranea is an important model for stem cell research and regeneration, but adequate genome resources for this species have been lacking. Here we report a highly contiguous genome assembly of S. mediterranea, using long-read sequencing and a de novo assembler (MARVEL) enhanced for low-complexity reads. The S. mediterranea genome is highly polymorphic and repetitive, and harbours a novel class of giant retroelements. Furthermore, the genome assembly lacks a number of highly conserved genes, including critical components of the mitotic spindle assembly checkpoint, but planarians maintain checkpoint function. Our genome assembly provides a key model system resource that will be useful for studying regeneration and the evolutionary plasticity of core cell biological mechanisms.