Extraction and analysis of signatures from the Gene Expression Omnibus by the crowd

Zichen Wang(Illumina (United States)), Caroline D. Monteiro(Illumina (United States)), Kathleen M. Jagodnik(Illumina (United States)), Nicolas Fernandez(Illumina (United States)), Gregory W. Gundersen(Illumina (United States)), Andrew D. Rouillard(Illumina (United States)), Sherry L. Jenkins(Illumina (United States)), Axel S. Feldmann(Illumina (United States)), Kevin Hu(Illumina (United States)), Michael G. McDermott(Illumina (United States)), Qiaonan Duan(Illumina (United States)), Neil R. Clark(Illumina (United States)), Matthew R. Jones(Illumina (United States)), Yan Kou(Illumina (United States)), Troy Goff(Illumina (United States)), Holly Woodland(Petroleum Geo-Services (United Kingdom)), Fabio M. R. Amaral(University of Nottingham), Gregory L. Szeto(Ragon Institute of MGH, MIT and Harvard), Oliver Fuchs(German Center for Lung Research), Sophia Miryam Schüssler‐Fiorenza Rose(VA Palo Alto Health Care System), Shvetank Sharma(Institute of Liver and Biliary Sciences), Uwe Schwartz(University of Regensburg), Xabier Bengoetxea Bausela(Universidad de Navarra), Maciej Szymkiewicz(Warsaw School of Information Technology), Vasileios Maroulis, Anton Salykin(Masaryk University), Carolina Barra(Hospital Del Mar), Candice D. Kruth(APT Therapeutics (United States)), Nicholas J. Bongio(Shenandoah University), Vaibhav Mathur(IBM (India)), Radmila D Todoric(Sidra Medical and Research Center), Udi Rubin(Columbia University), Apostolos Malatras(Centre National de la Recherche Scientifique), Carl T. Fulp(Shibuya (Japan)), John A. Galindo(Universidad Nacional de Colombia), Ruta Motiejunaite(Brigham and Women's Hospital), Christoph Jüschke(Carl von Ossietzky Universität Oldenburg), Philip C. Dishuck, Katharina Lahl(Technical University of Denmark), Mohieddin Jafari(Pasteur Institute of Iran), Sara Aibar(Universidad de Salamanca), Apostolos Zaravinos(European University Cyprus), Linda H. Steenhuizen(Annamalai University), Lindsey R. Allison, Pablo Gamallo, Fernando de Andrés(Government of Extremadura), Tyler Dae Devlin(Providence College), Vicente Pérez-García(Consejo Superior de Investigaciones Científicas), Avi Ma’ayan(Illumina (United States))
Nature Communications
September 26, 2016
Cited by 298Open Access
Full Text

Abstract

Gene expression data are accumulating exponentially in public repositories. Reanalysis and integration of themed collections from these studies may provide new insights, but requires further human curation. Here we report a crowdsourcing project to annotate and reanalyse a large number of gene expression profiles from Gene Expression Omnibus (GEO). Through a massive open online course on Coursera, over 70 participants from over 25 countries identify and annotate 2,460 single-gene perturbation signatures, 839 disease versus normal signatures, and 906 drug perturbation signatures. All these signatures are unique and are manually validated for quality. Global analysis of these signatures confirms known associations and identifies novel associations between genes, diseases and drugs. The manually curated signatures are used as a training set to develop classifiers for extracting similar signatures from the entire GEO repository. We develop a web portal to serve these signatures for query, download and visualization.


Related Papers

No related papers found

Powered by citation graph analysis