Towards complete and error-free genome assemblies of all vertebrate speciesAbstract High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are available for only a few non-microbial species 1–4 . To address this issue, the international Genome 10K (G10K) consortium 5,6 has worked over a five-year period to evaluate and develop cost-effective methods for assembling highly accurate and nearly complete reference genomes. Here we present lessons learned from generating assemblies for 16 species that represent six major vertebrate lineages. We confirm that long-read sequencing technologies are essential for maximizing genome quality, and that unresolved complex repeats and haplotype heterozygosity are major sources of assembly error when not handled correctly. Our assemblies correct substantial errors, add missing sequence in some of the best historical reference genomes, and reveal biological discoveries. These include the identification of many false gene duplications, increases in gene sizes, chromosome rearrangements that are specific to lineages, a repeated independent chromosome breakpoint in bat genomes, and a canonical GC-rich pattern in protein-coding genes and their regulatory regions. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an international effort to generate high-quality, complete reference genomes for all of the roughly 70,000 extant vertebrate species and to help to enable a new era of discovery across the life sciences.
Improved reference genome of Aedes aegypti informs arbovirus vector controlFemale Aedes aegypti mosquitoes infect more than 400 million people each year with dangerous viral pathogens including dengue, yellow fever, Zika and chikungunya. Progress in understanding the biology of mosquitoes and developing the tools to fight them has been slowed by the lack of a high-quality genome assembly. Here we combine diverse technologies to produce the markedly improved, fully re-annotated AaegL5 genome assembly, and demonstrate how it accelerates mosquito science. We anchored physical and cytogenetic maps, doubled the number of known chemosensory ionotropic receptors that guide mosquitoes to human hosts and egg-laying sites, provided further insight into the size and composition of the sex-determining M locus, and revealed copy-number variation among glutathione S-transferase genes that are important for insecticide resistance. Using high-resolution quantitative trait locus and population genomic analyses, we mapped new candidates for dengue vector competence and insecticide resistance. AaegL5 will catalyse new biological insights and intervention strategies to fight this deadly disease vector.
De novo assembly of haplotype-resolved genomes with trio binningSergey Koren, Arang Rhie, Brian P. Walenz et al.|Nature Biotechnology|2018 Speciation in birds: Genes, geography, and sexual selectionScott V. Edwards, Sarah B. Kingan, Jennifer D. Calkins et al.|Proceedings of the National Academy of Sciences|2005 Molecular studies of speciation in birds over the last three decades have been dominated by a focus on the geography, ecology, and timing of speciation, a tradition traceable to Mayr's Systematics and the Origin of Species. However, in the recent years, interest in the behavioral and molecular mechanisms of speciation in birds has increased, building in part on the older traditions and observations from domesticated species. The result is that many of the same mechanisms proffered for model lineages such as Drosophila--mechanisms such as genetic incompatibilities, reinforcement, and sexual selection--are now being seriously entertained for birds, albeit with much lower resolution. The recent completion of a draft sequence of the chicken genome, and an abundance of single-nucleotide polymorphisms on the autosomes and sex chromosomes, will dramatically accelerate research on the molecular mechanisms of avian speciation over the next few years. The challenge for ornithologists is now to inform well studied examples of speciation in nature with increased molecular resolution-to clone speciation genes if they exist--and thereby evaluate the relative roles of extrinsic, intrinsic, deterministic, and stochastic causes for avian diversification.
The first near-complete assembly of the hexaploid bread wheat genome, <i>Triticum aestivum</i>Common bread wheat, Triticum aestivum, has one of the most complex genomes known to science, with 6 copies of each chromosome, enormous numbers of near-identical sequences scattered throughout, and an overall haploid size of more than 15 billion bases. Multiple past attempts to assemble the genome have produced assemblies that were well short of the estimated genome size. Here we report the first near-complete assembly of T. aestivum, using deep sequencing coverage from a combination of short Illumina reads and very long Pacific Biosciences reads. The final assembly contains 15 344 693 583 bases and has a weighted average (N50) contig size of 232 659 bases. This represents by far the most complete and contiguous assembly of the wheat genome to date, providing a strong foundation for future genetic studies of this important food crop. We also report how we used the recently published genome of Aegilops tauschii, the diploid ancestor of the wheat D genome, to identify 4 179 762 575 bp of T. aestivum that correspond to its D genome components.