Inverting the model of genomics data sharing with the NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space

Michael C. Schatz(Johns Hopkins University), Anthony Philippakis(Broad Institute), Enis Afgan(Johns Hopkins University), Eric Banks(Broad Institute), Vincent J. Carey(Harvard University), Robert J. Carroll(Vanderbilt University Medical Center), Alessandro Culotti(Broad Institute), Kyle Ellrott(Oregon Health & Science University), Jeremy Goecks(Oregon Health & Science University), Robert L. Grossman(University of Chicago), Ira M. Hall(Yale University), Kasper D. Hansen(Johns Hopkins University), Jonathan Lawson(Broad Institute), Jeffrey T. Leek(Johns Hopkins University), Anne O’Donnell‐Luria(Broad Institute), Stephen Mosher(Johns Hopkins University), Martin Morgan(Roswell Park Comprehensive Cancer Center), Anton Nekrutenko(Pennsylvania State University), Brian D. O’Connor(Broad Institute), Kevin Osborn(University of California, Santa Cruz), Benedict Paten(University of California, Santa Cruz), Candace Patterson(Broad Institute), Frederick J. Tan(Department of Embryology), Casey Overby Taylor(Johns Hopkins University), Jennifer Vessio(Johns Hopkins University), Levi Waldron(City University of New York), Ting Wang(Washington University in St. Louis), Kristin Wuichet(Vanderbilt University Medical Center), Alexander Baumann, Andrew Rula, Anton Kovalsy(Pennsylvania State University), C. Bernard, Derek Caetano-Anollés, Geraldine Van Der Auwera, Justin Canas, K. Ümit Yüksel, Kate Herman, Megan Taylor(Johns Hopkins University), Marianie Simeon, Michaël Baumann(Johns Hopkins University), Qi Wang(Washington University in St. Louis), Robert Title(University of Chicago), Ruchi Munshi, Sushma Chaluvadi, Valerie B Reeves, William Disman, Salin Thomas, Allie Hajian, Elizabeth Kiernan, Namrata Gupta, Trish Vosburg, Ludwig Geistlinger, Marcel Ramos, Sehyun Oh, Dave Rogers, Frances McDade, Mim Hastie, Nitesh Turaga, Alexander Ostrovsky, Alexandru Mahmoud, Dannon Baker, Dave Clements, Katherine E.L. Cox, Keith Suderman, Nataliya Kucher, Sergey Golitsynskiy, Samantha Zarate, Sarah J. Wheelan, Kai Kammers, Ana Stevens, Carolyn M. Hutter, Christopher Wellington, Elena M. Ghanaim, Ken Wiley, Shurjo K. Sen(Johns Hopkins University), Valentina Di Francesco, Deni s Yuen, Brian Walsh(Broad Institute), Luke Sargent, Vahid Jalili, John Chilton, Lori Shepherd, Benjamin J. Stubbs, Ash O’Farrell, Benton A. Vizzier, Charles Overbeck, Charles Reid, David Steinberg, Elizabeth A. Sheets, Julian Lucas, Lon Blauvelt, Louise Cabansay, Noah Warren, Brian Hannafious(Broad Institute), Tim Harris, Radhika Reddy, Eric S. Torstenson(Broad Institute), M. Katie Banasiewicz, Haley Abel, Jason Walker
Cell Genomics
January 1, 2022
Cited by 128Open Access
Full Text

Abstract

The NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space (AnVIL; https://anvilproject.org) was developed to address a widespread community need for a unified computing environment for genomics data storage, management, and analysis. In this perspective, we present AnVIL, describe its ecosystem and interoperability with other platforms, and highlight how this platform and associated initiatives contribute to improved genomic data sharing efforts. The AnVIL is a federated cloud platform designed to manage and store genomics and related data, enable population-scale analysis, and facilitate collaboration through the sharing of data, code, and analysis results. By inverting the traditional model of data sharing, the AnVIL eliminates the need for data movement while also adding security measures for active threat detection and monitoring and provides scalable, shared computing resources for any researcher. We describe the core data management and analysis components of the AnVIL, which currently consists of Terra, Gen3, Galaxy, RStudio/Bioconductor, Dockstore, and Jupyter, and describe several flagship genomics datasets available within the AnVIL. We continue to extend and innovate the AnVIL ecosystem by implementing new capabilities, including mechanisms for interoperability and responsible data sharing, while streamlining access management. The AnVIL opens many new opportunities for analysis, collaboration, and data sharing that are needed to drive research and to make discoveries through the joint analysis of hundreds of thousands to millions of genomes along with associated clinical and molecular data types.


Related Papers

No related papers found

Powered by citation graph analysis