High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios

Marta Byrska-Bishop(New York Genome Center), Uday S. Evani(New York Genome Center), Xuefang Zhao(Broad Institute), Anna O. Basile(New York Genome Center), Haley Abel(James S. McDonnell Foundation), Allison Regier(James S. McDonnell Foundation), André Corvelo(New York Genome Center), Wayne E. Clarke(New York Genome Center), Rajeeva Musunuri(New York Genome Center), Kshithija Nagulapalli(New York Genome Center), Susan Fairley(European Bioinformatics Institute), Alexi Runnels(New York Genome Center), Lara Winterkorn(New York Genome Center), Ernesto Lowy(European Bioinformatics Institute), Evan E. Eichler(European Bioinformatics Institute), Jan O. Korbel(New York Genome Center), Charles Lee(Broad Institute), Tobias Marschall(James S. McDonnell Foundation), Scott E. Devine(Broad Institute), William T. Harvey(New York Genome Center), Weichen Zhou(New York Genome Center), Ryan E. Mills, Tobias Rausch, Sushant Kumar, Can Alkan, Fereydoun Hormozdiari, Zechen Chong, Yu Chen, Xiaofei Yang, Jiadong Lin, Mark Gerstein, Kai Ye, Qihui Zhu, Feyza Yilmaz, Chunlin Xiao, Paul Flicek(European Bioinformatics Institute), Søren Germer(New York Genome Center), Harrison Brand(Broad Institute), Ira M. Hall(Washington University in St. Louis), Michael E. Talkowski(Broad Institute), Giuseppe Narzisi(New York Genome Center), Michael C. Zody(New York Genome Center)
Cell
September 1, 2022
Cited by 1,025Open Access
Full Text

Abstract

The 1000 Genomes Project (1kGP) is the largest fully open resource of whole-genome sequencing (WGS) data consented for public distribution without access or use restrictions. The final, phase 3 release of the 1kGP included 2,504 unrelated samples from 26 populations and was based primarily on low-coverage WGS. Here, we present a high-coverage 3,202-sample WGS 1kGP resource, which now includes 602 complete trios, sequenced to a depth of 30X using Illumina. We performed single-nucleotide variant (SNV) and short insertion and deletion (INDEL) discovery and generated a comprehensive set of structural variants (SVs) by integrating multiple analytic methods through a machine learning model. We show gains in sensitivity and precision of variant calls compared to phase 3, especially among rare SNVs as well as INDELs and SVs spanning frequency spectrum. We also generated an improved reference imputation panel, making variants discovered here accessible for association studies.


Related Papers