A draft human pangenome referenceAbstract Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals 1 . These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels. Based on alignments of the assemblies, we generate a draft pangenome that captures known variants and haplotypes and reveals new alleles at structurally complex loci. We also add 119 million base pairs of euchromatic polymorphic sequences and 1,115 gene duplications relative to the existing reference GRCh38. Roughly 90 million of the additional base pairs are derived from structural variation. Using our draft pangenome to analyse short-read data reduced small variant discovery errors by 34% and increased the number of structural variants detected per haplotype by 104% compared with GRCh38-based workflows, which enabled the typing of the vast majority of structural variant alleles per sample.
Pangenome graph construction from genome alignments with Minigraph-CactusGlenn Hickey, Jean Monlong, Jana Ebler et al.|Nature Biotechnology|2023 Semi-automated assembly of high-quality diploid human reference genomesAbstract The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society 1,2 . However, it still has many gaps and errors, and does not represent a biological genome as it is a blend of multiple individuals 3,4 . Recently, a high-quality telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but it was derived from a hydatidiform mole cell line with a nearly homozygous genome 5 . To address these limitations, the Human Pangenome Reference Consortium formed with the goal of creating high-quality, cost-effective, diploid genome assemblies for a pangenome reference that represents human genetic diversity 6 . Here, in our first scientific report, we determined which combination of current genome sequencing and assembly approaches yield the most complete and accurate diploid genome assembly with minimal manual curation. Approaches that used highly accurate long reads and parent–child data with graph-based haplotype phasing during assembly outperformed those that did not. Developing a combination of the top-performing methods, we generated our first high-quality diploid reference assembly, containing only approximately four gaps per chromosome on average, with most chromosomes within ±1% of the length of CHM13. Nearly 48% of protein-coding genes have non-synonymous amino acid changes between haplotypes, and centromeric regions showed the highest diversity. Our findings serve as a foundation for assembling near-complete diploid human genomes at scale for a pangenome reference to capture global genetic variation from single nucleotides to structural rearrangements.
Environmental variation and rivers govern the structure of chimpanzee genetic diversity in a biodiversity hotspotBACKGROUND: The mechanisms that underlie the diversification of tropical animals remain poorly understood, but new approaches that combine geo-spatial modeling with spatially explicit genetic data are providing fresh insights on this topic. Data about the diversification of tropical mammals remain particularly sparse, and vanishingly few opportunities exist to study endangered large mammals that increasingly exist only in isolated pockets. The chimpanzees of Cameroon represent a unique opportunity to examine the mechanisms that promote genetic differentiation in tropical mammals because the region is home to two chimpanzee subspecies: Pan troglodytes ellioti and P. t. trogolodytes. Their ranges converge in central Cameroon, which is a geographically, climatically and environmentally complex region that presents an unparalleled opportunity to examine the roles of rivers and/or environmental variation in influencing the evolution of chimpanzee populations. RESULTS: We analyzed microsatellite genotypes and mtDNA HVRI sequencing data from wild chimpanzees sampled at a fine geographic scale across Cameroon and eastern Nigeria using a spatially explicit approach based upon Generalized Dissimilarity Modeling. Both the Sanaga River and environmental variation were found to contribute to driving separation of the subspecies. The importance of environmental variation differed among subspecies. Gene-environment associations were weak in P. t. troglodytes, whereas environmental variation was found to play a much larger role in shaping patterns of genetic differentiation in P. t. ellioti. CONCLUSIONS: We found that both the Sanaga River and environmental variation likely play a role in shaping patterns of chimpanzee genetic diversity. Future studies using single nucleotide polymorphism (SNP) data are necessary to further understand how rivers and environmental variation contribute to shaping patterns of genetic variation in chimpanzees.
Evidence from Cameroon reveals differences in the genetic structure and histories of chimpanzee populationsMary Katherine Gonder, Sabrina Locatelli, Lora Ghobrial et al.|Proceedings of the National Academy of Sciences|2011 The history of the genus Pan is a topic of enduring interest. Chimpanzees (Pan troglodytes) are often divided into subspecies, but the population structure and genetic history of chimpanzees across Africa remain unclear. Some population genetics studies have led to speculation that, until recently, this species constituted a single population with ongoing gene flow across its range, which resulted in a continuous gradient of allele frequencies. Chimpanzees, designated here as P. t. ellioti, occupy the Gulf of Guinea region that spans southern Nigeria and western Cameroon at the center of the distribution of this species. Remarkably, few studies have included individuals from this region, hindering the examination of chimpanzee population structure across Africa. Here, we analyzed microsatellite genotypes of 94 chimpanzees, including 32 designated as P. t. ellioti. We find that chimpanzees fall into three major populations: (i) Upper Guinea in western Africa (P. t. verus); (ii) the Gulf of Guinea region (P. t. ellioti); and (iii) equatorial Africa (P. t. troglodytes and P. t. schweinfurthii). Importantly, the Gulf of Guinea population is significantly different genetically from the others, sharing a last common ancestor with the populations in Upper Guinea ~0.46 million years ago (mya) and equatorial Africa ~0.32 mya. Equatorial chimpanzees are subdivided into up to three populations occupying southern Cameroon, central Africa, and eastern Africa, which may have constituted a single population until ~0.10-0.11 mya. Finally, occasional hybridization may be occurring between the Gulf of Guinea and southern Cameroon populations.