Comparison of RNA-seq and microarray-based models for clinical endpoint prediction

Wenqian Zhang(BGI Group (China)), Ying Yu(Fudan University), Falk Hertwig(University of Cologne), Jean Thierry‐Mieg(National Center for Biotechnology Information), Wenwei Zhang(BGI Group (China)), Danielle Thierry‐Mieg(National Center for Biotechnology Information), Jian Wang(BGI Group (China)), Cesare Furlanello(Fondazione Bruno Kessler), Viswanath Devanarayan(AbbVie (United States)), Jie Cheng(South College), Youping Deng(Rush University), Barbara Hero(University Hospital Cologne), Huixiao Hong(National Center for Toxicological Research), Meiwen Jia(Fudan University), Li Li(SAS Institute (United States)), Simon Lin(Marshfield Clinic), Yuri Nikolsky(Thomson Reuters (United States)), André Oberthuer(University Hospital Cologne), Tao Qing(Fudan University), Zhenqiang Su(National Center for Toxicological Research), Ruth Volland(University Hospital Cologne), Charles Wang(Loma Linda University), May D. Wang(Georgia Institute of Technology), Junmei Ai(Rush University), Davide Albanese(Fondazione Edmund Mach), Shahab Asgharzadeh(Children's Hospital of Los Angeles), Smadar Avigad(Schneider Children's Medical Center), Wenjun Bao(SAS Institute (United States)), Marina Bessarabova(Thomson Reuters (United States)), Murray H. Brilliant(Marshfield Clinic), Benedikt Brors(German Cancer Research Center), Marco Chierici(Fondazione Bruno Kessler), Tzu‐Ming Chu(SAS Institute (United States)), Jibin Zhang(BGI Group (China)), Richard G. Grundy(University of Nottingham), Min He(Marshfield Clinic), Scott J. Hebbring(Marshfield Clinic), Howard L. Kaufman(Rush University), Samir Lababidi(Center for Biologics Evaluation and Research), Lee Lancashire(Thomson Reuters (United States)), Yan Li(Rush University), Xin X. Lu(AbbVie (United States)), Heng Luo(University of Arkansas at Little Rock), Xiwen Ma(Eli Lilly (United States)), Baitang Ning(National Center for Toxicological Research), Rosa Noguera(Universitat de València), Martin Peifer(University of Cologne), John H. Phan(Georgia Institute of Technology), Frederik Roels(University of Cologne), Carolina Rosswog(University Hospital Cologne), Susan Shao(SAS Institute (United States)), Jie Shen(National Center for Toxicological Research), Jessica Theißen(University Hospital Cologne), Gian Paolo Tonini(University of Padua), Jo Vandesompele(Ghent University Hospital), Po-Yen Wu(Georgia Institute of Technology), Wenzhong Xiao(Harvard University), Joshua Xu(National Center for Toxicological Research), Weihong Xu(Stanford University), Jiekun Xuan(National Center for Toxicological Research), Yong Yang(Eli Lilly (United States)), Zhan Ye(Marshfield Clinic), Zirui Dong(BGI Group (China)), Ke Zhang(University of North Dakota), Ye Yin(BGI Group (China)), Chen Zhao(Fudan University), Yuanting Zheng(Fudan University), Russell D. Wolfinger(SAS Institute (United States)), Tieliu Shi(East China Normal University), Linda H. Malkas(City of Hope), Frank Berthold(University of Cologne), Jun Wang(BGI Group (China)), Weida Tong(National Center for Toxicological Research), Leming Shi(National Center for Toxicological Research), Zhiyu Peng(BGI Group (China)), Matthias Fischer(University of Cologne)
Genome Biology
June 24, 2015
Cited by 429Open Access
Full Text

Abstract

BACKGROUND: Gene expression profiling is being widely applied in cancer research to identify biomarkers for clinical endpoint prediction. Since RNA-seq provides a powerful tool for transcriptome-based applications beyond the limitations of microarrays, we sought to systematically evaluate the performance of RNA-seq-based and microarray-based classifiers in this MAQC-III/SEQC study for clinical endpoint prediction using neuroblastoma as a model. RESULTS: We generate gene expression profiles from 498 primary neuroblastomas using both RNA-seq and 44 k microarrays. Characterization of the neuroblastoma transcriptome by RNA-seq reveals that more than 48,000 genes and 200,000 transcripts are being expressed in this malignancy. We also find that RNA-seq provides much more detailed information on specific transcript expression patterns in clinico-genetic neuroblastoma subgroups than microarrays. To systematically compare the power of RNA-seq and microarray-based models in predicting clinical endpoints, we divide the cohort randomly into training and validation sets and develop 360 predictive models on six clinical endpoints of varying predictability. Evaluation of factors potentially affecting model performances reveals that prediction accuracies are most strongly influenced by the nature of the clinical endpoint, whereas technological platforms (RNA-seq vs. microarrays), RNA-seq data analysis pipelines, and feature levels (gene vs. transcript vs. exon-junction level) do not significantly affect performances of the models. CONCLUSIONS: We demonstrate that RNA-seq outperforms microarrays in determining the transcriptomic characteristics of cancer, while RNA-seq and microarray-based models perform similarly in clinical endpoint prediction. Our findings may be valuable to guide future studies on the development of gene expression-based predictive models and their implementation in clinical practice.


Related Papers