An expansive human regulatory lexicon encoded in transcription factor footprints

Shane Neph(University of Washington), Jeff Vierstra(University of Washington), Andrew B. Stergachis(University of Washington), Alex Reynolds(University of Washington), Eric Haugen(University of Washington), Benjamin Vernot(University of Washington), Robert E. Thurman(University of Washington), Sam John(University of Washington), Richard Sandstrom(University of Washington), Audra Johnson(University of Washington), Matthew T. Maurano(University of Washington), Richard Humbert(University of Washington), Eric Rynes(University of Washington), Hao Wang(University of Washington), Shinny Vong(University of Washington), Kristen Lee(University of Washington), Daniel Bates(University of Washington), Morgan Diegel(University of Washington), Vaughn Roach(University of Washington), Douglas Dunn(University of Washington), Jun Neri(University of Washington), Anthony Schafer(University of Washington), R. Scott Hansen(University of Washington), Tanya Kutyavin(University of Washington), Erika Giste(University of Washington), Molly Weaver(University of Washington), Theresa K. Canfield(University of Washington), Peter J. Sabo(University of Washington), Miaohua Zhang(Fred Hutch Cancer Center), Gayathri Balasundaram(Fred Hutch Cancer Center), Rachel Byron(Fred Hutch Cancer Center), Michael J. MacCoss(University of Washington), Joshua M. Akey(University of Washington), M. A. Bender(Fred Hutch Cancer Center), Mark Groudine(Fred Hutch Cancer Center), Rajinder Kaul(University of Washington Medical Center), J Stamatoyannopoulos(University of Washington)
Nature
September 1, 2012
Cited by 802Open Access
Full Text

Abstract

Regulatory factor binding to genomic DNA protects the underlying sequence from cleavage by DNase I, leaving nucleotide-resolution footprints. Using genomic DNase I footprinting across 41 diverse cell and tissue types, we detected 45 million transcription factor occupancy events within regulatory regions, representing differential binding to 8.4 million distinct short sequence elements. Here we show that this small genomic sequence compartment, roughly twice the size of the exome, encodes an expansive repertoire of conserved recognition sequences for DNA-binding proteins that nearly doubles the size of the human cis–regulatory lexicon. We find that genetic variants affecting allelic chromatin states are concentrated in footprints, and that these elements are preferentially sheltered from DNA methylation. High-resolution DNase I cleavage patterns mirror nucleotide-level evolutionary conservation and track the crystallographic topography of protein–DNA interfaces, indicating that transcription factor structure has been evolutionarily imprinted on the human genome sequence. We identify a stereotyped 50-base-pair footprint that precisely defines the site of transcript origination within thousands of human promoters. Finally, we describe a large collection of novel regulatory factor recognition motifs that are highly conserved in both sequence and function, and exhibit cell-selective occupancy patterns that closely parallel major regulators of development, differentiation and pluripotency. DNase I footprinting in 41 cell and tissue types reveals millions of short sequence elements encoding an expansive repertoire of conserved recognition sequences for DNA-binding proteins. DNaseI footprinting detects DNA sequences that are protected from cleavage by DNaseI because they are bound by regulatory factors. Studying these footprints in 41 diverse cell and tissue types, the authors describe millions of short sequence elements that are conserved recognition sequences for DNA-binding proteins. The effort nearly doubles the size of the human cis-regulatory lexicon and provides insight into chromatin states and levels of evolutionary conservation. A large collection of novel regulatory-factor recognition motifs that closely parallel major regulators of development, differentiation and pluripotency is also described.


Related Papers

No related papers found

Powered by citation graph analysis