An integrated map of structural variation in 2,504 human genomesStructural variants are implicated in numerous diseases and make up the majority of varying nucleotides among human genomes. Here we describe an integrated set of eight structural variant classes comprising both balanced and unbalanced variants, which we constructed using short-read DNA sequencing data and statistically phased onto haplotype blocks in 26 human populations. Analysing this set, we identify numerous gene-intersecting structural variants exhibiting population stratification and describe naturally occurring homozygous gene knockouts that suggest the dispensability of a variety of human genes. We demonstrate that structural variants are enriched on haplotypes identified by genome-wide association studies and exhibit enrichment for expression quantitative trait loci. Additionally, we uncover appreciable levels of structural variant complexity at different scales, including genic loci subject to clusters of repeated rearrangement and complex structural variants with multiple breakpoints likely to have formed through individual mutational events. Our catalogue will enhance future studies into structural variant demography, functional impact and disease association. The Structural Variation Analysis Group of The 1000 Genomes Project reports an integrated structural variation map based on discovery and genotyping of eight major structural variation classes in 2,504 unrelated individuals from across 26 populations; structural variation is compared within and between populations and its functional impact is quantified. The Structural Variation Analysis Group of The 1000 Genomes Project reports an integrated structural variation map based on discovery and genotyping of eight major structural variation classes in genomes for 2,504 unrelated individuals from across 26 populations. They characterize structural variation within and between populations and quantify its functional effect. The authors further create a phased reference panel that will be valuable for population genetic and disease association studies.
Evolution and Functional Impact of Rare Coding Variation from Deep Sequencing of Human ExomesAs a first step toward understanding how rare variants contribute to risk for complex diseases, we sequenced 15,585 human protein-coding genes to an average median depth of 111× in 2440 individuals of European (n = 1351) and African (n = 1088) ancestry. We identified over 500,000 single-nucleotide variants (SNVs), the majority of which were rare (86% with a minor allele frequency less than 0.5%), previously unknown (82%), and population-specific (82%). On average, 2.3% of the 13,595 SNVs each person carried were predicted to affect protein function of ~313 genes per genome, and ~95.7% of SNVs predicted to be functionally important were rare. This excess of rare functional variants is due to the combined effects of explosive, recent accelerated population growth and weak purifying selection. Furthermore, we show that large sample sizes will be required to associate rare variants with complex traits.
Multi-platform discovery of haplotype-resolved structural variation in human genomesThe incomplete identification of structural variants (SVs) from whole-genome sequencing data limits studies of human genetic diversity and disease association. Here, we apply a suite of long-read, short-read, strand-specific sequencing technologies, optical mapping, and variant discovery algorithms to comprehensively analyze three trios to define the full spectrum of human genetic variation in a haplotype-resolved manner. We identify 818,054 indel variants (<50 bp) and 27,622 SVs (≥50 bp) per genome. We also discover 156 inversions per genome and 58 of the inversions intersect with the critical regions of recurrent microdeletion and microduplication syndromes. Taken together, our SV callsets represent a three to sevenfold increase in SV detection compared to most standard high-throughput sequencing studies, including those from the 1000 Genomes Project. The methods and the dataset presented serve as a gold standard for the scientific community allowing us to make recommendations for maximizing structural variation sensitivity for future genome sequencing studies.
Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variantsDetecting and Estimating Contamination of Human DNA Samples in Sequencing and Array-Based Genotype DataGoo Jun, Matthew Flickinger, Kurt N. Hetrick et al.|The American Journal of Human Genetics|2012