A complete diploid human genome benchmark for personalized genomics

Nancy F. Hansen(National Institutes of Health), Nathan Dwarshuis(National Institute of Standards and Technology), Hyun Joo Ji(Johns Hopkins University), Arang Rhie(National Institutes of Health), Hailey Loucks(University of California, Santa Cruz), Glennis A. Logsdon(University of Pennsylvania), Mitchell R. Vollger(University of Washington Medical Center), Jessica M. Storer(University of Connecticut), Juhyun Kim(National Institutes of Health), Eleni Adam(Old Dominion University), Nicolas Altemose(Chan Zuckerberg Initiative (United States)), Dmitry Antipov(National Institutes of Health), Mobin Asri(University of California, Santa Cruz), Sofia N. Barreira(National Human Genome Research Institute), Stephanie C. Bohaczuk(University of Washington Medical Center), Andrey V. Bzikadze(University of California San Diego), Sara A. Carioscia(Johns Hopkins University), Andrew Carroll(Google (United States)), Kuan-Hao Chao(Johns Hopkins University), Yanan Chu(Chinese Academy of Sciences), Arun Das(Johns Hopkins University), Peter Ebert(Düsseldorf University Hospital), Adam C. English(Baylor College of Medicine), Mark Fleharty(Broad Institute), Laura E. Fleming(Broad Institute), Giulio Formenti(Rockefeller University), Andrea Guarracino(University of Tennessee Health Science Center), Gabrielle A. Hartley(University of Connecticut), Katharine M. Jenike(Johns Hopkins University), Jenna Kalleberg(University of Missouri), Yu Kang(Chinese Academy of Sciences), Robert C. King(Oxford Nanopore Technologies (United Kingdom)), Josipa Lipovac(University of Zagreb), Mira Mastoras(University of California, Santa Cruz), Matthew W. Mitchell(Coriell Institute For Medical Research), Shloka Negi(University of California, Santa Cruz), Nathan D. Olson(National Institute of Standards and Technology), Keisuke K. Oshima(University of Pennsylvania), Luis F. Paulin(Baylor College of Medicine), Brandon D. Pickett(National Institutes of Health), David Porubskỳ(University of Washington), Jane Ranchalis(University of Washington Medical Center), Desh Ranjan(Old Dominion University), Mikko Rautiainen(University of Helsinki), Harold Riethman(Old Dominion University), Robert D. Schnabel(University of Missouri), Fritz J. Sedlazeck(Baylor College of Medicine), Kishwar Shafin(Google (United States)), Mile Šikić(Agency for Science, Technology and Research), Steven J. Solar(National Institutes of Health), Alexander P. Sweeten(National Institutes of Health), Winston Timp(Johns Hopkins University), Justin Wagner(National Institute of Standards and Technology), DongAhn Yoo(University of Washington), Ying Zhou(Dana-Farber Cancer Institute), Erik Garrison(University of Tennessee Health Science Center), Evan E. Eichler(Howard Hughes Medical Institute), Michael C. Schatz(Johns Hopkins University), Andrew B. Stergachis(University of Washington), Rachel J. O’Neill(University of Connecticut), Karen H. Miga(University of California, Santa Cruz), Steven L. Salzberg(Johns Hopkins University), Sergey Koren(National Institutes of Health), Justin M. Zook(National Institute of Standards and Technology), Adam M. Phillippy(National Institutes of Health)
bioRxiv (Cold Spring Harbor Laboratory)
September 21, 2025
Cited by 20Open Access
Full Text

Abstract

Human genome resequencing typically involves mapping reads to a reference genome to call variants; however, this approach suffers from both technical and reference biases, leaving many duplicated and structurally polymorphic regions of the genome unmapped. Consequently, existing variant benchmarks, generated by the same methods, fail to assess these complex regions. To address this limitation, we present a telomere-to-telomere genome benchmark that achieves near-perfect accuracy (i.e. no detectable errors) across 99.4% of the complete, diploid HG002 genome. This benchmark adds 701.4 Mb of autosomal sequence and both sex chromosomes (216.8 Mb), totaling 15.3% of the genome that was absent from prior benchmarks. We also provide a diploid annotation of genes, transposable elements, segmental duplications, and satellite repeats, including 39,144 protein-coding genes across both haplotypes. To facilitate application of the benchmark, we developed tools for measuring the accuracy of sequencing reads, phased variant call sets, and genome assemblies against a diploid reference. Genome-wide analyses show that state-of-the-art de novo assembly methods resolve 2-7% more sequence and outperform variant calling accuracy by an order of magnitude, yielding just one error per 100 kb across 99.9% of the benchmark regions. Adoption of genome-based benchmarking is expected to accelerate the development of cost-effective methods for complete genome sequencing, expanding the reach of genomic medicine to the entire genome and enabling a new era of personalized genomics.


Related Papers

No related papers found

Powered by citation graph analysis