The ENCODE Uniform Analysis Pipelines

Benjamin C. Hitz(Stanford University), Jin-Wook Lee(Stanford University), Otto Jolanki(Stanford University), Meenakshi S. Kagda(Stanford University), Keenan Graham(Stanford University), Paul Sud(Stanford University), Idan Gabdank(Stanford University), J. Seth Strattan(Stanford University), Cricket A. Sloan(Stanford University), Timothy R. Dreszer(Stanford University), Laurence D. Rowe(Stanford University), Nikhil R. Podduturi(Stanford University), Venkat S. Malladi(Stanford University), Esther T. Chan(Stanford University), Jean M. Davidson(Stanford University), Marcus Ho(Stanford University), Stuart R. Miyasato(Stanford University), Matt Simison(Stanford University), Forrest Y. Tanaka(Stanford University), Yunhai Luo(Stanford University), Ian Whaling(Stanford University), Eurie L. Hong(Stanford University), Brian T. Lee(University of California, Santa Cruz), Richard Sandstrom(Altius Institute for Biomedical Sciences), Eric Rynes(Altius Institute for Biomedical Sciences), Jemma Nelson(Altius Institute for Biomedical Sciences), Andrew Nishida(Altius Institute for Biomedical Sciences), Alyssa Ingersoll(Altius Institute for Biomedical Sciences), Michael Buckley(Altius Institute for Biomedical Sciences), Mark Frerker(Altius Institute for Biomedical Sciences), Daniel Sunwook Kim(Palo Alto University), Nathan Boley(Palo Alto University), Diane Trout(California Institute of Technology), Alexander Dobin(Cold Spring Harbor Laboratory), Sorena Rahmanian(University of California, Irvine), Dana Wyman(University of California, Irvine), Gabriela Balderrama-Gutierrez(University of California, Irvine), Fairlie Reese(University of California, Irvine), Neva C. Durand(Broad Institute), Olga Dudchenko(Baylor College of Medicine), David Weisz(Baylor College of Medicine), Suhas S.P. Rao(University of California, San Francisco), Alyssa Blackburn(Baylor College of Medicine), Dimos Gkountaroulis(Baylor College of Medicine), Mahdi Sadr(Baylor College of Medicine), Moshe Olshansky(Broad Institute), Yossi Eliaz(Baylor College of Medicine), Dat Nguyen(Baylor College of Medicine), Ivan D. Bochkov(Baylor College of Medicine), Muhammad S. Shamim(Baylor College of Medicine), Ragini Mahajan(Center for Theoretical Biological Physics), Erez Lieberman Aiden(Broad Institute), T Gingeras(Cold Spring Harbor Laboratory), Simon Heath(Universitat Pompeu Fabra), Martin Hirst(University of British Columbia), W. James Kent(University of California, Santa Cruz), Anshul Kundaje(Palo Alto University), A Mortazavi(University of California, Irvine), B Wold(California Institute of Technology), J. Michael Cherry(Stanford University)
bioRxiv (Cold Spring Harbor Laboratory)
April 6, 2023
Cited by 99Open Access
Full Text

Abstract

Abstract The Encyclopedia of DNA elements (ENCODE) project is a collaborative effort to create a comprehensive catalog of functional elements in the human genome. The current database comprises more than 19000 functional genomics experiments across more than 1000 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory and transcriptional landscape of the Homo sapiens and Mus musculus genomes. All experimental data, metadata, and associated computational analyses created by the ENCODE consortium are submitted to the Data Coordination Center (DCC) for validation, tracking, storage, and distribution to community resources and the scientific community. The ENCODE project has engineered and distributed uniform processing pipelines in order to promote data provenance and reproducibility as well as allow interoperability between genomic resources and other consortia. All data files, reference genome versions, software versions, and parameters used by the pipelines are captured and available via the ENCODE Portal. The pipeline code, developed using Docker and Workflow Description Language (WDL; https://openwdl.org/ ) is publicly available in GitHub, with images available on Dockerhub ( https://hub.docker.com ), enabling access to a diverse range of biomedical researchers. ENCODE pipelines maintained and used by the DCC can be installed to run on personal computers, local HPC clusters, or in cloud computing environments via Cromwell. Access to the pipelines and data via the cloud allows small labs the ability to use the data or software without access to institutional compute clusters. Standardization of the computational methodologies for analysis and quality control leads to comparable results from different ENCODE collections - a prerequisite for successful integrative analyses. Database URL: https://www.encodeproject.org/


Related Papers

No related papers found

Powered by citation graph analysis