Western University
ORCID: 0000-0001-9001-4999Publishes on Gene expression and cancer classification, Statistical Methods and Inference, Bioinformatics and Genomic Networks. 507 papers and 14.4k citations.
Add your photo, update your bio, and get notified when your ranking changes.
BACKGROUND: In cancer studies, it is common that multiple microarray experiments are conducted to measure the same clinical outcome and expressions of the same set of genes. An important goal of such experiments is to identify a subset of genes that can potentially serve as predictive markers for cancer development and progression. Analyses of individual experiments may lead to unreliable gene selection results because of the small sample sizes. Meta analysis can be used to pool multiple experiments, increase statistical power, and achieve more reliable gene selection. The meta analysis of cancer microarray data is challenging because of the high dimensionality of gene expressions and the differences in experimental settings amongst different experiments. RESULTS: We propose a Meta Threshold Gradient Descent Regularization (MTGDR) approach for gene selection in the meta analysis of cancer microarray data. The MTGDR has many advantages over existing approaches. It allows different experiments to have different experimental settings. It can account for the joint effects of multiple genes on cancer, and it can select the same set of cancer-associated genes across multiple experiments. Simulation studies and analyses of multiple pancreatic and liver cancer experiments demonstrate the superior performance of the MTGDR. CONCLUSION: The MTGDR provides an effective way of analyzing multiple cancer microarray studies and selecting reliable cancer-associated genes.
BACKGROUND: Prognosis is of critical interest in breast cancer research. Biomedical studies suggest that genomic measurements may have independent predictive power for prognosis. Gene profiling studies have been conducted to search for predictive genomic measurements. Genes have the inherent pathway structure, where pathways are composed of multiple genes with coordinated functions. The goal of this study is to identify gene pathways with predictive power for breast cancer prognosis. Since our goal is fundamentally different from that of existing studies, a new pathway analysis method is proposed. RESULTS: The new method advances beyond existing alternatives along the following aspects. First, it can assess the predictive power of gene pathways, whereas existing methods tend to focus on model fitting accuracy only. Second, it can account for the joint effects of multiple genes in a pathway, whereas existing methods tend to focus on the marginal effects of genes. Third, it can accommodate multiple heterogeneous datasets, whereas existing methods analyze a single dataset only. We analyze four breast cancer prognosis studies and identify 97 pathways with significant predictive power for prognosis. Important pathways missed by alternative methods are identified. CONCLUSIONS: The proposed method provides a useful alternative to existing pathway analysis methods. Identified pathways can provide further insights into breast cancer prognosis.
We study the asymptotic properties of the adaptive Lasso estimators in sparse, high-dimensional, linear regression models when the number of covariates may increase with the sample size. We consider variable selection using the adap- tive Lasso, where the L1 norms in the penalty are re-weighted by data-dependent weights. We show that, if a reasonable initial estimator is available, under ap- propriate conditions, the adaptive Lasso correctly selects covariates with nonzero coefficients with probability converging to one, and that theestimators of nonzero coefficients have the same asymptotic distribution they would have if the zero co- efficients were known in advance. Thus, the adaptive Lasso hasan oracle property in the sense of Fan and Li (2001) and Fan and Peng (2004). In addition, under a partial orthogonality condition in which the covariates with zero coefficients are weakly correlated with the covariates with nonzero coefficients, marginal regression can be used to obtain the initial estimator. With this initial estimator, the adaptive Lasso has the oracle property even when the number of covariates is much larger than the sample size.