Complex genetic variation in nearly complete human genomes

Glennis A. Logsdon(University of Washington), Peter Ebert(Düsseldorf University Hospital), Peter A. Audano(Jackson Laboratory), Mark Loftus(Center for Human Genetics), David Porubskỳ(University of Washington), Jana Ebler(Düsseldorf University Hospital), Feyza Yilmaz(Jackson Laboratory), Pille Hallast(Jackson Laboratory), Timofey Prodanov(Düsseldorf University Hospital), DongAhn Yoo(University of Washington), Carolyn Paisie(Jackson Laboratory), William T. Harvey(University of Washington), Xuefang Zhao(Broad Institute), Gianni V. Martino(Medical University of South Carolina), Mir Henglin(Düsseldorf University Hospital), Katherine M. Munson(University of Washington), K Siddique-e Rabbani(University of Southern California), Chen-Shan Chin(Program for Appropriate Technology in Health), Bida Gu(University of Southern California), Hufsah Ashraf(Düsseldorf University Hospital), Stephan Scholz(Heinrich Heine University Düsseldorf), Olanrewaju Austine-Orimoloye(European Bioinformatics Institute), Parithi Balachandran(Jackson Laboratory), Marc Jan Bonder(University Medical Center Groningen), Haoyu Cheng(Yale University), Zechen Chong(University of Alabama), Jonathan Crabtree(University of Maryland, Baltimore), Mark Gerstein(Yale University), Lisbeth A. Guethlein(Stanford University), Patrick Hasenfeld(European Molecular Biology Laboratory), Glenn Hickey(University of California, Santa Cruz), Kendra Hoekzema(University of Washington), Sarah Hunt(European Bioinformatics Institute), Matthew Jensen(Yale University), Yunzhe Jiang(Yale University), Sergey Koren(National Institutes of Health), Young-Jun Kwon(University of Washington), Chong Li(Temple University), Heng Li(Harvard University), Jiaqi Li(Yale University), Paul J. Norman(University of Colorado Denver), Keisuke K. Oshima(University of Pennsylvania), Benedict Paten(University of California, Santa Cruz), Adam M. Phillippy(National Institutes of Health), Nicholas R. Pollock(University of Colorado Denver), Tobias Rausch(European Molecular Biology Laboratory), Mikko Rautiainen(University of Helsinki), Yuwei Song(University of Alabama), Arda Söylev(Düsseldorf University Hospital), Arvis Sulovari(University of Washington), Likhitha Surapaneni(European Bioinformatics Institute), Vasiliki Tsapalou(European Molecular Biology Laboratory), Weichen Zhou(University of Michigan), Ying Zhou(Dana-Farber Cancer Institute), Qihui Zhu(Stanford Health Care), Michael C. Zody(New York Genome Center), Ryan E. Mills(University of Michigan), Scott E. Devine(University of Maryland, Baltimore), Xinghua Shi(Temple University), Michael E. Talkowski(Broad Institute), Mark Chaisson(University of Southern California), Alexander Dilthey(Heinrich Heine University Düsseldorf), Miriam K. Konkel(Clemson University), Jan O. Korbel(European Bioinformatics Institute), Charles Lee(Jackson Laboratory), Christine R. Beck(Jackson Laboratory), Evan E. Eichler(Howard Hughes Medical Institute), Tobias Marschall(Düsseldorf University Hospital)
Nature
July 23, 2025
Cited by 70Open Access
Full Text

Abstract

Abstract Diverse sets of complete human genomes are required to construct a pangenome reference and to understand the extent of complex structural variation. Here we sequence 65 diverse human genomes and build 130 haplotype-resolved assemblies (median continuity of 130 Mb), closing 92% of all previous assembly gaps 1,2 and reaching telomere-to-telomere status for 39% of the chromosomes. We highlight complete sequence continuity of complex loci, including the major histocompatibility complex (MHC), SMN1 / SMN2 , NBPF8 and AMY1/AMY2 , and fully resolve 1,852 complex structural variants. In addition, we completely assemble and validate 1,246 human centromeres. We find up to 30-fold variation in α-satellite higher-order repeat array length and characterize the pattern of mobile element insertions into α-satellite higher-order repeat arrays. Although most centromeres predict a single site of kinetochore attachment, epigenetic analysis suggests the presence of two hypomethylated regions for 7% of centromeres. Combining our data with the draft pangenome reference 1 significantly enhances genotyping accuracy from short-read data, enabling whole-genome inference 3 to a median quality value of 45. Using this approach, 26,115 structural variants per individual are detected, substantially increasing the number of structural variants now amenable to downstream disease association studies.


Related Papers

No related papers found

Powered by citation graph analysis