Efficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short readsAlthough many de novo genome assembly projects have recently been conducted using high-throughput sequencers, assembling highly heterozygous diploid genomes is a substantial challenge due to the increased complexity of the de Bruijn graph structure predominantly used. To address the increasing demand for sequencing of nonmodel and/or wild-type samples, in most cases inbred lines or fosmid-based hierarchical sequencing methods are used to overcome such problems. However, these methods are costly and time consuming, forfeiting the advantages of massive parallel sequencing. Here, we describe a novel de novo assembler, Platanus, that can effectively manage high-throughput data from heterozygous samples. Platanus assembles DNA fragments (reads) into contigs by constructing de Bruijn graphs with automatically optimized k-mer sizes followed by the scaffolding of contigs based on paired-end information. The complicated graph structures that result from the heterozygosity are simplified during not only the contig assembly step but also the scaffolding step. We evaluated the assembly results on eukaryotic samples with various levels of heterozygosity. Compared with other assemblers, Platanus yields assembly results that have a larger scaffold NG50 length without any accompanying loss of accuracy in both simulated and real data. In addition, Platanus recorded the largest scaffold NG50 values for two of the three low-heterozygosity species used in the de novo assembly contest, Assemblathon 2. Platanus therefore provides a novel and efficient approach for the assembly of gigabase-sized highly heterozygous genomes and is an attractive alternative to the existing assemblers designed for genomes of lower heterozygosity.
The genomic basis of parasitism in the Strongyloides clade of nematodesTaisei Kikuchi, Mark Viney, Matthew Berriman and colleagues report the genome sequences of six species of nematodes from the Strongyloides clade of nematodes, including human and animal pathogens, facultative parasites and a free-living species. They find that expansions of the astacin and SCP/TAPS gene families are associated with parasitism in these species. Soil-transmitted nematodes, including the Strongyloides genus, cause one of the most prevalent neglected tropical diseases. Here we compare the genomes of four Strongyloides species, including the human pathogen Strongyloides stercoralis, and their close relatives that are facultatively parasitic (Parastrongyloides trichosuri) and free-living (Rhabditophanes sp. KR3021). A significant paralogous expansion of key gene families—families encoding astacin-like and SCP/TAPS proteins—is associated with the evolution of parasitism in this clade. Exploiting the unique Strongyloides life cycle, we compare the transcriptomes of the parasitic and free-living stages and find that these same gene families are upregulated in the parasitic stages, underscoring their role in nematode parasitism.
A genetic mechanism for female-limited Batesian mimicry in Papilio butterflyHaruhiko Fujiwara and colleagues report the genome sequences of two swallowtail butterfly species, Papilio xuthus and Papilio polytes, and the identification of a chromosomal inversion underlying the mimetic phenotype in P. polytes females. The inversion interacts with dsx to control mimetic coloration patterns in an allele-specific manner. In Batesian mimicry, animals avoid predation by resembling distasteful models. In the swallowtail butterfly Papilio polytes, only mimetic-form females resemble the unpalatable butterfly Pachliopta aristolochiae. A recent report showed that a single gene, doublesex (dsx), controls this mimicry1; however, the detailed molecular mechanisms remain unclear. Here we determined two whole-genome sequences of P. polytes and a related species, Papilio xuthus, identifying a single ∼130-kb autosomal inversion, including dsx, between mimetic (H-type) and non-mimetic (h-type) chromosomes in P. polytes. This inversion is associated with the mimicry-related locus H, as identified by linkage mapping. Knockdown experiments demonstrated that female-specific dsx isoforms expressed from the inverted H allele (dsx(H)) induce mimetic coloration patterns and simultaneously repress non-mimetic patterns. In contrast, dsx(h) does not alter mimetic patterns. We propose that dsx(H) switches the coloration of predetermined wing patterns and that female-limited polymorphism is tightly maintained by chromosomal inversion.
Platanus-allee is a de novo haplotype assembler enabling a comprehensive access to divergent heterozygous regionsRei Kajitani, Dai Yoshimura, Miki Okuno et al.|Nature Communications|2019 The ultimate goal for diploid genome determination is to completely decode homologous chromosomes independently, and several phasing programs from consensus sequences have been developed. These methods work well for lowly heterozygous genomes, but the manifold species have high heterozygosity. Additionally, there are highly divergent regions (HDRs), where the haplotype sequences differ considerably. Because HDRs are likely to direct various interesting biological phenomena, many genomic analysis targets fall within these regions. However, they cannot be accessed by existing phasing methods, and we have to adopt costly traditional methods. Here, we develop a de novo haplotype assembler, Platanus-allee ( http://platanus.bio.titech.ac.jp/platanus2 ), which initially constructs each haplotype sequence and then untangles the assembly graphs utilizing sequence links and synteny information. A comprehensive benchmark analysis reveals that Platanus-allee exhibits high recall and precision, particularly for HDRs. Using this approach, previously unknown HDRs are detected in the human genome, which may uncover novel aspects of genome variability.
Coelacanth genomes reveal signatures for evolutionary transition from water to landCoelacanths are known as "living fossils," as they show remarkable morphological resemblance to the fossil record and belong to the most primitive lineage of living Sarcopterygii (lobe-finned fishes and tetrapods). Coelacanths may be key to elucidating the tempo and mode of evolution from fish to tetrapods. Here, we report the genome sequences of five coelacanths, including four Latimeria chalumnae individuals (three specimens from Tanzania and one from Comoros) and one L. menadoensis individual from Indonesia. These sequences cover two African breeding populations and two known extant coelacanth species. The genome is ∼2.74 Gbp and contains a high proportion (∼60%) of repetitive elements. The genetic diversity among the individuals was extremely low, suggesting a small population size and/or a slow rate of evolution. We found a substantial number of genes that encode olfactory and pheromone receptors with features characteristic of tetrapod receptors for the detection of airborne ligands. We also found that limb enhancers of bmp7 and gli3, both of which are essential for limb formation, are conserved between coelacanth and tetrapods, but not ray-finned fishes. We expect that some tetrapod-like genes may have existed early in the evolution of primitive Sarcopterygii and were later co-opted to adapt to terrestrial environments. These coelacanth genomes will provide a cornerstone for studies to elucidate how ancestral aquatic vertebrates evolved into terrestrial animals.