A draft human pangenome reference

Wen‐Wei Liao(Washington University in St. Louis), Mobin Asri(University of California, Santa Cruz), Jana Ebler(Heinrich Heine University Düsseldorf), Daniel Doerr(Heinrich Heine University Düsseldorf), Marina Haukness(University of California, Santa Cruz), Glenn Hickey(University of California, Santa Cruz), Shuangjia Lu(Yale University), Julian Lucas(University of California, Santa Cruz), Jean Monlong(University of California, Santa Cruz), Haley Abel(Washington University in St. Louis), Silvia Buonaiuto(Institute of Genetics and Biophysics), Xian Chang(University of California, Santa Cruz), Haoyu Cheng(Harvard University), Justin Chu(Dana-Farber Cancer Institute), Vincenza Colonna(University of Tennessee Health Science Center), Jordan M. Eizenga(University of California, Santa Cruz), Xiaowen Feng(Harvard University), Christian Fischer(University of Tennessee Health Science Center), Robert S. Fulton(James S. McDonnell Foundation), Shilpa Garg(Novo Nordisk Foundation), Cristian Groza(McGill University), Andrea Guarracino(University of Tennessee Health Science Center), William T. Harvey(University of Washington), Simon Heumos(University of Tübingen), Kerstin Howe(Wellcome Sanger Institute), Miten Jain(Northeastern University), Tsung-Yu Lu(University of Southern California), Charles Markello(University of California, Santa Cruz), Fergal J. Martin(European Bioinformatics Institute), Matthew W. Mitchell(Coriell Institute For Medical Research), Katherine M. Munson(University of Washington), Moses Njagi Mwaniki(University of Pisa), Adam M. Novak(University of California, Santa Cruz), Hugh E. Olsen(University of California, Santa Cruz), Trevor Pesout(University of California, Santa Cruz), David Porubskỳ(University of Washington), Pjotr Prins(University of Tennessee Health Science Center), Jonas A. Sibbesen(University of Copenhagen), Jouni Sirén(University of California, Santa Cruz), Chad Tomlinson(James S. McDonnell Foundation), Flavia Villani(University of Tennessee Health Science Center), Mitchell R. Vollger(University of Washington), Lucinda Antonacci-Fulton(James S. McDonnell Foundation), Gunjan Baid(Google (United States)), Carl Baker(University of Washington), Anastasiya Belyaeva(Google (United States)), Konstantinos Billis(European Bioinformatics Institute), Andrew Carroll(Google (United States)), Pi-Chuan Chang(Google (United States)), Sarah Cody(James S. McDonnell Foundation), Daniel E. Cook(Google (United States)), Robert Cook‐Deegan(Washington Center), Omar E. Cornejo(University of California, Santa Cruz), Mark Diekhans(University of California, Santa Cruz), Peter Ebert(Heinrich Heine University Düsseldorf), Susan Fairley(European Bioinformatics Institute), Olivier Fédrigo(Rockefeller University), Adam L. Felsenfeld(National Institutes of Health), Giulio Formenti(Rockefeller University), Adam Frankish(European Bioinformatics Institute), Yan Gao(Children's Hospital of Philadelphia), Nanibaa’ A. Garrison(University of California, Los Angeles), Carlos García Girón(European Bioinformatics Institute), Richard E. Green(University of California, Santa Cruz), Leanne Haggerty(European Bioinformatics Institute), Kendra Hoekzema(University of Washington), Thibaut Hourlier(European Bioinformatics Institute), Hanlee P. Ji(Stanford University), Eimear E. Kenny(Genomic Health (United States)), Barbara A. Koenig(University of California, San Francisco), Alexey Kolesnikov(Google (United States)), Jan O. Korbel(European Bioinformatics Institute), Jennifer Kordosky(University of Washington), Sergey Koren(National Institutes of Health), HoJoon Lee(Stanford University), Alexandra P. Lewis(University of Washington), Hugo Magalhães(Heinrich Heine University Düsseldorf), Santiago Marco‐Sola(Universitat Autònoma de Barcelona), Pierre Marijon(Heinrich Heine University Düsseldorf), Ann M. Mc Cartney(National Institutes of Health), Jennifer McDaniel(National Institute of Standards and Technology), Jacquelyn Mountcastle(Rockefeller University), Maria Nattestad(Google (United States)), Sergey Nurk(National Institutes of Health), Nathan D. Olson(National Institute of Standards and Technology), Alice B. Popejoy(University of California, Davis), Daniela Puiu(Johns Hopkins University), Mikko Rautiainen(National Institutes of Health), Allison Regier(James S. McDonnell Foundation), Arang Rhie(National Institutes of Health), Samuel Sacco(University of California, Santa Cruz), Ashley D. Sanders(Max Delbrück Center), Valérie Schneider(National Institutes of Health), Baergen I. Schultz(National Institutes of Health), Kishwar Shafin(Google (United States)), Michael W. Smith(National Institutes of Health), Heidi J. Sofia(National Institutes of Health), Ahmad Abou Tayoun(Al Jalila Foundation), Françoise Thibaud‐Nissen(National Institutes of Health), Francesca Floriana Tricomi(European Bioinformatics Institute), Justin Wagner(National Institute of Standards and Technology), Brian P. Walenz(National Institutes of Health), Jonathan Wood(Wellcome Sanger Institute), Aleksey V. Zimin(Johns Hopkins University), Guillaume Bourque(Kyoto University), Mark Chaisson(University of Southern California), Paul Flicek(European Bioinformatics Institute), Adam M. Phillippy(National Institutes of Health), Justin M. Zook(National Institute of Standards and Technology), Evan E. Eichler(Howard Hughes Medical Institute), David Haussler(Howard Hughes Medical Institute), Ting Wang(James S. McDonnell Foundation), Erich D. Jarvis(Howard Hughes Medical Institute), Karen H. Miga(University of California, Santa Cruz), Erik Garrison(University of Tennessee Health Science Center), Tobias Marschall(Heinrich Heine University Düsseldorf), Ira M. Hall(Yale University), Heng Li(Harvard University), Benedict Paten(University of California, Santa Cruz)
Nature
May 10, 2023
Cited by 1,125Open Access
Full Text

Abstract

Abstract Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals 1 . These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels. Based on alignments of the assemblies, we generate a draft pangenome that captures known variants and haplotypes and reveals new alleles at structurally complex loci. We also add 119 million base pairs of euchromatic polymorphic sequences and 1,115 gene duplications relative to the existing reference GRCh38. Roughly 90 million of the additional base pairs are derived from structural variation. Using our draft pangenome to analyse short-read data reduced small variant discovery errors by 34% and increased the number of structural variants detected per haplotype by 104% compared with GRCh38-based workflows, which enabled the typing of the vast majority of structural variant alleles per sample.


Related Papers