High-quality draft assemblies of mammalian genomes from massively parallel sequence dataSante Gnerre, Iain MacCallum, Dariusz Przybylski et al.|Proceedings of the National Academy of Sciences|2010 Massively parallel DNA sequencing technologies are revolutionizing genomics by making it possible to generate billions of relatively short (~100-base) sequence reads at very low cost. Whereas such data can be readily used for a wide range of biomedical applications, it has proven difficult to use them to generate high-quality de novo genome assemblies of large, repeat-rich vertebrate genomes. To date, the genome assemblies generated from such data have fallen far short of those obtained with the older (but much more expensive) capillary-based sequencing approach. Here, we report the development of an algorithm for genome assembly, ALLPATHS-LG, and its application to massively parallel DNA sequence data from the human and mouse genomes, generated on the Illumina platform. The resulting draft genome assemblies have good accuracy, short-range contiguity, long-range connectivity, and coverage of the genome. In particular, the base accuracy is high (≥99.95%) and the scaffold sizes (N50 size = 11.5 Mb for human and 7.2 Mb for mouse) approach those obtained with capillary-based sequencing. The combination of improved sequencing technology and improved computational methods should now make it possible to increase dramatically the de novo sequencing of large genomes. The ALLPATHS-LG program is available at http://www.broadinstitute.org/science/programs/genome-biology/crd.
Genome sequence and analysis of the Irish potato famine pathogen Phytophthora infestansThe genome of Phytophthora infestans, the pathogen that triggered the Irish potato famine in the nineteenth century, has been sequenced. It remains a devastating pathogen, with late blight destroying crops worth billions of dollars each year. Blight is difficult to control, in part because it adapts so quickly to genetically resistant potato strains. Comparison with two other Phytophthora genomes shows rapid turnover and extensive expansion of specific families of secreted disease effector proteins, including many genes induced during infection that have activities thought to alter host physiology. These fast evolving effector genes are found in highly dynamic and expanded regions of the genome, a factor that may contribute to its rapid adaptability to host plants. The P. infestans genome is the biggest so far sequenced, at about 240 megabases, with an extremely high repeat content of close to 75%. It is a model organism for the oomycetes, a distinct lineage of fungus-like eukaryotes related to organisms such as brown algae and diatoms. Phytophthora infestans is a fungus-like eukaryote and the most destructive pathogen of potato, with current annual worldwide potato crop losses due to late blight estimated at $6.7 billion. Here, the sequence of the P. infestans genome is reported. Comparison with two other Phytophthora genomes showed rapid turnover and extensive expansion of certain secreted disease effector proteins, probably explaining the rapid adaptability of the pathogen to host plants. Phytophthora infestans is the most destructive pathogen of potato and a model organism for the oomycetes, a distinct lineage of fungus-like eukaryotes that are related to organisms such as brown algae and diatoms. As the agent of the Irish potato famine in the mid-nineteenth century, P. infestans has had a tremendous effect on human history, resulting in famine and population displacement1. To this day, it affects world agriculture by causing the most destructive disease of potato, the fourth largest food crop and a critical alternative to the major cereal crops for feeding the world’s population1. Current annual worldwide potato crop losses due to late blight are conservatively estimated at $6.7 billion2. Management of this devastating pathogen is challenged by its remarkable speed of adaptation to control strategies such as genetically resistant cultivars3,4. Here we report the sequence of the P. infestans genome, which at ∼240 megabases (Mb) is by far the largest and most complex genome sequenced so far in the chromalveolates. Its expansion results from a proliferation of repetitive DNA accounting for ∼74% of the genome. Comparison with two other Phytophthora genomes showed rapid turnover and extensive expansion of specific families of secreted disease effector proteins, including many genes that are induced during infection or are predicted to have activities that alter host physiology. These fast-evolving effector genes are localized to highly dynamic and expanded regions of the P. infestans genome. This probably plays a crucial part in the rapid adaptability of the pathogen to host plants and underpins its evolutionary potential.
Whole-genome resequencing reveals loci under selection during chicken domesticationThe domestication of the chicken over a period of several thousand years and its later specialization into meat producing (broiler) and egg producing (layer) lines is an informative model of domestication and phenotypic evolution. A study using massively parallel sequencing of domestic chicken and its wild ancestor, the red jungle fowl, reveals a number of 'selective sweeps', where benign genetic variations closely linked to a mutation that dramatically enhances survival increase in frequency relative to other alleles. Most striking of these — found in all domestic chickens — is one at a locus encoding thyroid stimulating hormone receptor, which has a key role in metabolism and vertebrate reproductive timing. This sweep may be related to a classic feature of domesticated animals, the absence of the strict regulation of seasonal reproduction found in wild populations. Several of the selective sweeps detected in broilers overlap genes associated with growth, appetite and metabolic regulation. Here, the genomes of birds representing eight populations of domestic chickens are compared with the genome of their wild ancestor, the red jungle fowl. The results reveal selective sweeps of favourable alleles and mutations that may have contributed to domestication. One selective sweep, for instance, occurred at the locus encoding the thyroid stimulating hormone receptor, which is important in metabolism and in the timing of vertebrate reproduction. Domestic animals are excellent models for genetic studies of phenotypic evolution1,2,3. They have evolved genetic adaptations to a new environment, the farm, and have been subjected to strong human-driven selection leading to remarkable phenotypic changes in morphology, physiology and behaviour. Identifying the genetic changes underlying these developments provides new insight into general mechanisms by which genetic variation shapes phenotypic diversity. Here we describe the use of massively parallel sequencing to identify selective sweeps of favourable alleles and candidate mutations that have had a prominent role in the domestication of chickens (Gallus gallus domesticus) and their subsequent specialization into broiler (meat-producing) and layer (egg-producing) chickens. We have generated 44.5-fold coverage of the chicken genome using pools of genomic DNA representing eight different populations of domestic chickens as well as red jungle fowl (Gallus gallus), the major wild ancestor4. We report more than 7,000,000 single nucleotide polymorphisms, almost 1,300 deletions and a number of putative selective sweeps. One of the most striking selective sweeps found in all domestic chickens occurred at the locus for thyroid stimulating hormone receptor (TSHR), which has a pivotal role in metabolic regulation and photoperiod control of reproduction in vertebrates. Several of the selective sweeps detected in broilers overlapped genes associated with growth, appetite and metabolic regulation. We found little evidence that selection for loss-of-function mutations had a prominent role in chicken domestication, but we detected two deletions in coding sequences that we suggest are functionally important. This study has direct application to animal breeding and enhances the importance of the domestic chicken as a model organism for biomedical research.
A structural variation reference for medical and population geneticsAbstract Structural variants (SVs) rearrange large segments of DNA 1 and can have profound consequences in evolution and human disease 2,3 . As national biobanks, disease-association studies, and clinical genetic testing have grown increasingly reliant on genome sequencing, population references such as the Genome Aggregation Database (gnomAD) 4 have become integral in the interpretation of single-nucleotide variants (SNVs) 5 . However, there are no reference maps of SVs from high-coverage genome sequencing comparable to those for SNVs. Here we present a reference of sequence-resolved SVs constructed from 14,891 genomes across diverse global populations (54% non-European) in gnomAD. We discovered a rich and complex landscape of 433,371 SVs, from which we estimate that SVs are responsible for 25–29% of all rare protein-truncating events per genome. We found strong correlations between natural selection against damaging SNVs and rare SVs that disrupt or duplicate protein-coding sequence, which suggests that genes that are highly intolerant to loss-of-function are also sensitive to increased dosage 6 . We also uncovered modest selection against noncoding SVs in cis -regulatory elements, although selection against protein-truncating SVs was stronger than all noncoding effects. Finally, we identified very large (over one megabase), rare SVs in 3.9% of samples, and estimate that 0.13% of individuals may carry an SV that meets the existing criteria for clinically important incidental findings 7 . This SV resource is freely distributed via the gnomAD browser 8 and will have broad utility in population genetics, disease-association studies, and diagnostic screening.
The genomic substrate for adaptive radiation in African cichlid fishCichlid fishes are famous for large, diverse and replicated adaptive radiations in the Great Lakes of East Africa. To understand the molecular mechanisms underlying cichlid phenotypic diversity, we sequenced the genomes and transcriptomes of five lineages of African cichlids: the Nile tilapia (Oreochromis niloticus), an ancestral lineage with low diversity; and four members of the East African lineage: Neolamprologus brichardi/pulcher (older radiation, Lake Tanganyika), Metriaclima zebra (recent radiation, Lake Malawi), Pundamilia nyererei (very recent radiation, Lake Victoria), and Astatotilapia burtoni (riverine species around Lake Tanganyika). We found an excess of gene duplications in the East African lineage compared to tilapia and other teleosts, an abundance of non-coding element divergence, accelerated coding sequence evolution, expression divergence associated with transposable element insertions, and regulation by novel microRNAs. In addition, we analysed sequence data from sixty individuals representing six closely related species from Lake Victoria, and show genome-wide diversifying selection on coding and regulatory variants, some of which were recruited from ancient polymorphisms. We conclude that a number of molecular mechanisms shaped East African cichlid genomes, and that amassing of standing variation during periods of relaxed purifying selection may have been important in facilitating subsequent evolutionary diversification. Genomes and transcriptomes of five distinct lineages of African cichlids, a textbook example of adaptive radiation, have been sequenced and analysed to reveal that many types of molecular changes contributed to rapid evolution, and that standing variation accumulated during periods of relaxed selection may have primed subsequent diversification. The 2,000 or so species of cichlid fish, to be found in the lakes and rivers of Africa's Rift Valley, provide the classic example of adaptive radiations. This large-scale international collaboration has sequenced and analysed the genomes and transcriptomes of five distinct lineages of African cichlids. The data reveal an excess of gene duplications in comparison to other fish species. There is an abundance of non-coding element divergence; accelerated coding sequence evolution; expression divergence associated with transposable element insertions in orthologous gene pairs; and regulation by novel miRNAs. Sequencing data from sixty individuals from six closely related Lake Victoria species point to rapid cichlid speciation associated with genome-wide diversifying selection on coding and regulatory variants, and imply that ancient periods of relaxed purifying selection enabled the accumulation of standing variation, which may have been important in facilitating diversification.