A genome-wide mutational constraint map quantified from variation in 76,156 human genomes

Siwei Chen(Massachusetts General Hospital), Laurent C. Francioli(Broad Institute), Julia K. Goodrich(Broad Institute), Ryan L. Collins(Broad Institute), Masahiro Kanai(Broad Institute), Qingbo S. Wang(Broad Institute), Jessica Alföldi(Broad Institute), Nicholas A. Watts(Broad Institute), Christopher Vittal(Broad Institute), Laura D. Gauthier(Broad Institute), Timothy Poterba(Broad Institute), Michael W. Wilson(Broad Institute), Yekaterina Tarasova(Broad Institute), William Phu(Broad Institute), Mary T. Yohannes(Broad Institute), Zan Koenig(Broad Institute), Yossi Farjoun(Broad Institute), Eric Banks(Broad Institute), Stacey Donnelly(Broad Institute), Stacey Gabriel(Broad Institute), Namrata Gupta(Broad Institute), Steven Ferriera(Broad Institute), Charlotte Tolonen(Broad Institute), Sam Novod(Broad Institute), Louis Bergelson(Broad Institute), David Roazen(Broad Institute), Valentín Ruano-Rubio(Broad Institute), Miguel Covarrubias(Broad Institute), Christopher Llanwarne(Broad Institute), Nikelle Petrillo(Broad Institute), Gordon Wade(Broad Institute), Thibault Jeandet(Broad Institute), Ruchi Munshi(Broad Institute), Kathleen Tibbetts(Broad Institute), Anne O’Donnell‐Luria(Broad Institute), Matthew Solomonson(Broad Institute), Cotton Seed(Broad Institute), Alicia R. Martin(Broad Institute), Michael E. Talkowski(Broad Institute), Heidi L. Rehm(Broad Institute), Mark J. Daly(Broad Institute), Grace Tiao(Broad Institute), Benjamin M. Neale(Broad Institute), Daniel G. MacArthur(Broad Institute), Konrad J. Karczewski(Broad Institute)
bioRxiv (Cold Spring Harbor Laboratory)
March 21, 2022
Cited by 325Open Access
Full Text

Abstract

Abstract The depletion of disruptive variation caused by purifying natural selection (constraint) has been widely used to investigate protein-coding genes underlying human disorders, but attempts to assess constraint for non-protein-coding regions have proven more difficult. Here we aggregate, process, and release a dataset of 76,156 human genomes from the Genome Aggregation Database (gnomAD), the largest public open-access human genome reference dataset, and use this dataset to build a mutational constraint map for the whole genome. We present a refined mutational model that incorporates local sequence context and regional genomic features to detect depletions of variation across the genome. As expected, proteincoding sequences overall are under stronger constraint than non-coding regions. Within the non-coding genome, constrained regions are enriched for known regulatory elements and variants implicated in complex human diseases and traits, facilitating the triangulation of biological annotation, disease association, and natural selection to non-coding DNA analysis. More constrained regulatory elements tend to regulate more constrained protein-coding genes, while non-coding constraint captures additional functional information underrecognized by gene constraint metrics. We demonstrate that this genome-wide constraint map provides an effective approach for characterizing the non-coding genome and improving the identification and interpretation of functional human genetic variation.


Related Papers

No related papers found

Powered by citation graph analysis