Increased mutation and gene conversion within human segmental duplications

Mitchell R. Vollger(University of Washington), Philip C. Dishuck(University of Washington), William T. Harvey(University of Washington), William S. DeWitt(University of Washington), Xavi Guitart(University of Washington), Michael E. Goldberg(University of Washington), Allison N. Rozanski(University of Washington), Julian Lucas(University of California, Santa Cruz), Mobin Asri(University of California, Santa Cruz), Haley Abel(Washington University in St. Louis), Lucinda Antonacci-Fulton(James S. McDonnell Foundation), Gunjan Baid(Google (United States)), Carl Baker(University of Washington), Anastasiya Belyaeva(Google (United States)), Konstantinos Billis(European Bioinformatics Institute), Guillaume Bourque(Kyoto University), Silvia Buonaiuto(Institute of Genetics and Biophysics), Andrew Carroll(Google (United States)), Mark Chaisson(University of Southern California), Pi-Chuan Chang(Google (United States)), Xian Chang(University of California, Santa Cruz), Haoyu Cheng(Harvard University), Justin Chu(Dana-Farber Cancer Institute), Sarah Cody(James S. McDonnell Foundation), Vincenza Colonna(University of Tennessee Health Science Center), Daniel E. Cook(Google (United States)), Robert Cook‐Deegan(Washington Center), Omar E. Cornejo(University of California, Santa Cruz), Mark Diekhans(University of California, Santa Cruz), Daniel Doerr(Heinrich Heine University Düsseldorf), Peter Ebert(Heinrich Heine University Düsseldorf), Jana Ebler(Heinrich Heine University Düsseldorf), Jordan M. Eizenga(University of California, Santa Cruz), Susan Fairley(European Bioinformatics Institute), Olivier Fédrigo(Rockefeller University), Adam L. Felsenfeld(National Institutes of Health), Xiaowen Feng(Harvard University), Christian Fischer(University of Tennessee Health Science Center), Paul Flicek(European Bioinformatics Institute), Giulio Formenti(Rockefeller University), Adam Frankish(European Bioinformatics Institute), Robert S. Fulton(James S. McDonnell Foundation), Yan Gao(Children's Hospital of Philadelphia), Shilpa Garg(Novo Nordisk Foundation), Erik Garrison(University of Tennessee Health Science Center), Nanibaa’ A. Garrison(University of California, Los Angeles), Carlos García Girón(European Bioinformatics Institute), Richard E. Green(University of California, Santa Cruz), Cristian Groza(McGill University), Andrea Guarracino(University of Tennessee Health Science Center), Leanne Haggerty(European Bioinformatics Institute), Ira M. Hall(Yale University), Marina Haukness(University of California, Santa Cruz), David Haussler(Howard Hughes Medical Institute), Simon Heumos(University of Tübingen), Glenn Hickey(University of California, Santa Cruz), Thibaut Hourlier(European Bioinformatics Institute), Kerstin Howe(Wellcome Sanger Institute), Miten Jain(Northeastern University), Erich D. Jarvis(Howard Hughes Medical Institute), Hanlee P. Ji(Stanford University), Eimear E. Kenny(Genomic Health (United States)), Barbara A. Koenig(University of California, San Francisco), Alexey Kolesnikov(Google (United States)), Jan O. Korbel(European Bioinformatics Institute), Jennifer Kordosky(University of Washington), Sergey Koren(National Institutes of Health), HoJoon Lee(Stanford University), Heng Li(Harvard University), Wen‐Wei Liao(Washington University in St. Louis), Shuangjia Lu(Yale University), Tsung-Yu Lu(University of Southern California), Julian Lucas(University of California, Santa Cruz), Hugo Magalhães(Heinrich Heine University Düsseldorf), Santiago Marco‐Sola(Universitat Autònoma de Barcelona), Pierre Marijon(Heinrich Heine University Düsseldorf), Charles Markello(University of California, Santa Cruz), Tobias Marschall(Heinrich Heine University Düsseldorf), Fergal J. Martin(European Bioinformatics Institute), Ann M. Mc Cartney(National Institutes of Health), Jennifer McDaniel(National Institute of Standards and Technology), Karen H. Miga(University of California, Santa Cruz), Matthew W. Mitchell(Coriell Institute For Medical Research), Jean Monlong(University of California, Santa Cruz), Jacquelyn Mountcastle(Rockefeller University), Moses Njagi Mwaniki(University of Pisa), Maria Nattestad(Google (United States)), Adam M. Novak(University of California, Santa Cruz), Sergey Nurk(National Institutes of Health), Hugh E. Olsen(University of California, Santa Cruz), Nathan D. Olson(National Institute of Standards and Technology), Benedict Paten(University of California, Santa Cruz), Trevor Pesout(University of California, Santa Cruz), Adam M. Phillippy(National Institutes of Health), Alice B. Popejoy(University of California, Davis), Pjotr Prins(University of Tennessee Health Science Center), Daniela Puiu(Johns Hopkins University), Mikko Rautiainen(National Institutes of Health), Allison Regier(James S. McDonnell Foundation), Arang Rhie(National Institutes of Health), Samuel Sacco(University of California, Santa Cruz), Ashley D. Sanders(Max Delbrück Center), Valérie Schneider(National Institutes of Health), Baergen I. Schultz(National Institutes of Health), Kishwar Shafin(Google (United States)), Jonas A. Sibbesen(University of Copenhagen), Jouni Sirén(University of California, Santa Cruz), Michael W. Smith(National Institutes of Health), Heidi J. Sofia(National Institutes of Health), Ahmad Abou Tayoun(Al Jalila Foundation), Françoise Thibaud-Nissen(National Institutes of Health), Chad Tomlinson(James S. McDonnell Foundation), Francesca Floriana Tricomi(European Bioinformatics Institute), Flavia Villani(University of Tennessee Health Science Center), Mitchell R. Vollger(University of Washington), Justin Wagner(National Institute of Standards and Technology), Brian P. Walenz(National Institutes of Health), Ting Wang(James S. McDonnell Foundation), Jonathan Wood(Wellcome Sanger Institute), Aleksey V. Zimin(Johns Hopkins University), Justin M. Zook(National Institute of Standards and Technology), Katherine M. Munson(University of Washington), Alexandra P. Lewis(University of Washington), Kendra Hoekzema(University of Washington), Glennis A. Logsdon(University of Washington), David Porubskỳ(University of Washington), Benedict Paten(University of California, Santa Cruz), Kelley Harris(University of Washington), PingHsun Hsieh(University of Washington), Evan E. Eichler(Howard Hughes Medical Institute)
Nature
May 10, 2023
Cited by 112Open Access
Full Text

Abstract

Abstract Single-nucleotide variants (SNVs) in segmental duplications (SDs) have not been systematically assessed because of the limitations of mapping short-read sequencing data 1,2 . Here we constructed 1:1 unambiguous alignments spanning high-identity SDs across 102 human haplotypes and compared the pattern of SNVs between unique and duplicated regions 3,4 . We find that human SNVs are elevated 60% in SDs compared to unique regions and estimate that at least 23% of this increase is due to interlocus gene conversion (IGC) with up to 4.3 megabase pairs of SD sequence converted on average per human haplotype. We develop a genome-wide map of IGC donors and acceptors, including 498 acceptor and 454 donor hotspots affecting the exons of about 800 protein-coding genes. These include 171 genes that have ‘relocated’ on average 1.61 megabase pairs in a subset of human haplotypes. Using a coalescent framework, we show that SD regions are slightly evolutionarily older when compared to unique sequences, probably owing to IGC. SNVs in SDs, however, show a distinct mutational spectrum: a 27.1% increase in transversions that convert cytosine to guanine or the reverse across all triplet contexts and a 7.6% reduction in the frequency of CpG-associated mutations when compared to unique DNA. We reason that these distinct mutational properties help to maintain an overall higher GC content of SD DNA compared to that of unique DNA, probably driven by GC-biased conversion between paralogous sequences 5,6 .


Related Papers

No related papers found

Powered by citation graph analysis