The balance of reproducibility, sensitivity, and specificity of lists of differentially expressed genes in microarray studies

Leming Shi(National Center for Toxicological Research), Wendell Jones(Protein Express (United States)), Roderick V. Jensen(Boston University), Stephen Harris(National Center for Toxicological Research), Roger Perkins(ICF International (United States)), Federico Goodsaid(United States Food and Drug Administration), Lei Guo(National Center for Toxicological Research), Lisa J. Croner(Biogen (United States)), Cecilie Boysen, Hong Fang(ICF International (United States)), Feng Qian(ICF International (United States)), Shashi Amur(United States Food and Drug Administration), Wenjun Bao(SAS Institute (United States)), Cátálin Bárbácioru, Vincent Bertholet(Eppendorf (Belgium)), Xiaoxi Cao(ICF International (United States)), Tzu‐Ming Chu(SAS Institute (United States)), Patrick Collins(Agilent Technologies (United States)), Xiaohui Fan(National Center for Toxicological Research), Felix W. Frueh(United States Food and Drug Administration), James C. Fuscoe(National Center for Toxicological Research), Xu Guo, Jing Han(Center for Biologics Evaluation and Research), Damir Herman(National Institutes of Health), Huixiao Hong(ICF International (United States)), Ernest S. Kawasaki(National Cancer Institute), Quan‐Zhen Li(The University of Texas Southwestern Medical Center), Yuling Luo(Paragon Genomics (United States)), Yunqing Ma(Paragon Genomics (United States)), Nan Mei(National Center for Toxicological Research), Ron Peterson(Novartis (Switzerland)), Raj K. Puri(Center for Biologics Evaluation and Research), Richard Shippy, Zhenqiang Su(National Center for Toxicological Research), Yongming Sun, Hongmei Sun(ICF International (United States)), Brett T. Thorn(ICF International (United States)), Yaron Turpaz(Zhejiang University), Charles Wang(Cedars-Sinai Medical Center), Sue Jane Wang(United States Food and Drug Administration), Janet A. Warrington, James C. Willey(University of Toledo Medical Center), Jie Wu(ICF International (United States)), Qian Xie(ICF International (United States)), Liang Zhang, Lu Zhang, Sheng Zhong(University of Illinois Urbana-Champaign), Russell D. Wolfinger(SAS Institute (United States)), Weida Tong(National Center for Toxicological Research)
BMC Bioinformatics
August 1, 2008
Cited by 402Open Access
Full Text

Abstract

BACKGROUND: Reproducibility is a fundamental requirement in scientific experiments. Some recent publications have claimed that microarrays are unreliable because lists of differentially expressed genes (DEGs) are not reproducible in similar experiments. Meanwhile, new statistical methods for identifying DEGs continue to appear in the scientific literature. The resultant variety of existing and emerging methods exacerbates confusion and continuing debate in the microarray community on the appropriate choice of methods for identifying reliable DEG lists. RESULTS: Using the data sets generated by the MicroArray Quality Control (MAQC) project, we investigated the impact on the reproducibility of DEG lists of a few widely used gene selection procedures. We present comprehensive results from inter-site comparisons using the same microarray platform, cross-platform comparisons using multiple microarray platforms, and comparisons between microarray results and those from TaqMan - the widely regarded "standard" gene expression platform. Our results demonstrate that (1) previously reported discordance between DEG lists could simply result from ranking and selecting DEGs solely by statistical significance (P) derived from widely used simple t-tests; (2) when fold change (FC) is used as the ranking criterion with a non-stringent P-value cutoff filtering, the DEG lists become much more reproducible, especially when fewer genes are selected as differentially expressed, as is the case in most microarray studies; and (3) the instability of short DEG lists solely based on P-value ranking is an expected mathematical consequence of the high variability of the t-values; the more stringent the P-value threshold, the less reproducible the DEG list is. These observations are also consistent with results from extensive simulation calculations. CONCLUSION: We recommend the use of FC-ranking plus a non-stringent P cutoff as a straightforward and baseline practice in order to generate more reproducible DEG lists. Specifically, the P-value cutoff should not be stringent (too small) and FC should be as large as possible. Our results provide practical guidance to choose the appropriate FC and P-value cutoffs when selecting a given number of DEGs. The FC criterion enhances reproducibility, whereas the P criterion balances sensitivity and specificity.


Related Papers

No related papers found

Powered by citation graph analysis