Supervised Risk Predictor of Breast Cancer Based on Intrinsic SubtypesUNLABELLED: PURPOSE To improve on current standards for breast cancer prognosis and prediction of chemotherapy benefit by developing a risk model that incorporates the gene expression-based "intrinsic" subtypes luminal A, luminal B, HER2-enriched, and basal-like. METHODS A 50-gene subtype predictor was developed using microarray and quantitative reverse transcriptase polymerase chain reaction data from 189 prototype samples. Test sets from 761 patients (no systemic therapy) were evaluated for prognosis, and 133 patients were evaluated for prediction of pathologic complete response (pCR) to a taxane and anthracycline regimen. RESULTS: The intrinsic subtypes as discrete entities showed prognostic significance (P = 2.26E-12) and remained significant in multivariable analyses that incorporated standard parameters (estrogen receptor status, histologic grade, tumor size, and node status). A prognostic model for node-negative breast cancer was built using intrinsic subtype and clinical information. The C-index estimate for the combined model (subtype and tumor size) was a significant improvement on either the clinicopathologic model or subtype model alone. The intrinsic subtype model predicted neoadjuvant chemotherapy efficacy with a negative predictive value for pCR of 97%. CONCLUSION Diagnosis by intrinsic subtype adds significant prognostic and predictive information to standard parameters for patients with breast cancer. The prognostic properties of the continuous risk score will be of value for the management of node-negative breast cancers. The subtypes and risk score can also be used to assess the likelihood of efficacy from neoadjuvant chemotherapy.
Phenotypic and molecular characterization of the claudin-low intrinsic subtype of breast cancerAleix Prat, Joel S. Parker, Olga Karginova et al.|Breast Cancer Research|2010 INTRODUCTION: In breast cancer, gene expression analyses have defined five tumor subtypes (luminal A, luminal B, HER2-enriched, basal-like and claudin-low), each of which has unique biologic and prognostic features. Here, we comprehensively characterize the recently identified claudin-low tumor subtype. METHODS: The clinical, pathological and biological features of claudin-low tumors were compared to the other tumor subtypes using an updated human tumor database and multiple independent data sets. These main features of claudin-low tumors were also evaluated in a panel of breast cancer cell lines and genetically engineered mouse models. RESULTS: Claudin-low tumors are characterized by the low to absent expression of luminal differentiation markers, high enrichment for epithelial-to-mesenchymal transition markers, immune response genes and cancer stem cell-like features. Clinically, the majority of claudin-low tumors are poor prognosis estrogen receptor (ER)-negative, progesterone receptor (PR)-negative, and epidermal growth factor receptor 2 (HER2)-negative (triple negative) invasive ductal carcinomas with a high frequency of metaplastic and medullary differentiation. They also have a response rate to standard preoperative chemotherapy that is intermediate between that of basal-like and luminal tumors. Interestingly, we show that a group of highly utilized breast cancer cell lines, and several genetically engineered mouse models, express the claudin-low phenotype. Finally, we confirm that a prognostically relevant differentiation hierarchy exists across all breast cancers in which the claudin-low subtype most closely resembles the mammary epithelial stem cell. CONCLUSIONS: These results should help to improve our understanding of the biologic heterogeneity of breast cancer and provide tools for the further evaluation of the unique biology of claudin-low tumors and cell lines.
The molecular portraits of breast tumors are conserved across microarray platformsZhiyuan Hu, Cheng Fan, Daniel Oh et al.|BMC Genomics|2006 BACKGROUND: Validation of a novel gene expression signature in independent data sets is a critical step in the development of a clinically useful test for cancer patient risk-stratification. However, validation is often unconvincing because the size of the test set is typically small. To overcome this problem we used publicly available breast cancer gene expression data sets and a novel approach to data fusion, in order to validate a new breast tumor intrinsic list. RESULTS: A 105-tumor training set containing 26 sample pairs was used to derive a new breast tumor intrinsic gene list. This intrinsic list contained 1300 genes and a proliferation signature that was not present in previous breast intrinsic gene sets. We tested this list as a survival predictor on a data set of 311 tumors compiled from three independent microarray studies that were fused into a single data set using Distance Weighted Discrimination. When the new intrinsic gene set was used to hierarchically cluster this combined test set, tumors were grouped into LumA, LumB, Basal-like, HER2+/ER-, and Normal Breast-like tumor subtypes that we demonstrated in previous datasets. These subtypes were associated with significant differences in Relapse-Free and Overall Survival. Multivariate Cox analysis of the combined test set showed that the intrinsic subtype classifications added significant prognostic information that was independent of standard clinical predictors. From the combined test set, we developed an objective and unchanging classifier based upon five intrinsic subtype mean expression profiles (i.e. centroids), which is designed for single sample predictions (SSP). The SSP approach was applied to two additional independent data sets and consistently predicted survival in both systemically treated and untreated patient groups. CONCLUSION: This study validates the "breast tumor intrinsic" subtype classification as an objective means of tumor classification that should be translated into a clinical assay for further retrospective and prospective validation. In addition, our method of combining existing data sets can be used to robustly validate the potential clinical value of any new gene expression profile.
Residual breast cancers after conventional therapy display mesenchymal as well as tumor-initiating featuresChad J. Creighton, Xiaoxian Li, Melissa D. Landis et al.|Proceedings of the National Academy of Sciences|2009 Some breast cancers have been shown to contain a small fraction of cells characterized by CD44(+)/CD24(-/low) cell-surface antigen profile that have high tumor-initiating potential. In addition, breast cancer cells propagated in vitro as mammospheres (MSs) have also been shown to be enriched for cells capable of self-renewal. In this study, we have defined a gene expression signature common to both CD44(+)/CD24(-/low) and MS-forming cells. To examine its clinical significance, we determined whether tumor cells surviving after conventional treatments were enriched for cells bearing this CD44(+)/CD24(-/low)-MS signature. The CD44(+)/CD24(-/low)-MS signature was found mainly in human breast tumors of the recently identified "claudin-low" molecular subtype, which is characterized by expression of many epithelial-mesenchymal-transition (EMT)-associated genes. Both CD44(+)/CD24(-/low)-MS and claudin-low signatures were more pronounced in tumor tissue remaining after either endocrine therapy (letrozole) or chemotherapy (docetaxel), consistent with the selective survival of tumor-initiating cells posttreatment. We confirmed an increased expression of mesenchymal markers, including vimentin (VIM) in cytokeratin-positive epithelial cells metalloproteinase 2 (MMP2), in two separate sets of postletrozole vs. pretreatment specimens. Taken together, these data provide supporting evidence that the residual breast tumor cell populations surviving after conventional treatment may be enriched for subpopulations of cells with both tumor-initiating and mesenchymal features. Targeting proteins involved in EMT may provide a therapeutic strategy for eliminating surviving cells to prevent recurrence and improve long-term survival in breast cancer patients.
MapSplice: Accurate mapping of RNA-seq reads for splice junction discoveryKai Wang, Darshan Singh, Zheng Zeng et al.|Nucleic Acids Research|2010 The accurate mapping of reads that span splice junctions is a critical component of all analytic techniques that work with RNA-seq data. We introduce a second generation splice detection algorithm, MapSplice, whose focus is high sensitivity and specificity in the detection of splices as well as CPU and memory efficiency. MapSplice can be applied to both short (<75 bp) and long reads (≥ 75 bp). MapSplice is not dependent on splice site features or intron length, consequently it can detect novel canonical as well as non-canonical splices. MapSplice leverages the quality and diversity of read alignments of a given splice to increase accuracy. We demonstrate that MapSplice achieves higher sensitivity and specificity than TopHat and SpliceMap on a set of simulated RNA-seq data. Experimental studies also support the accuracy of the algorithm. Splice junctions derived from eight breast cancer RNA-seq datasets recapitulated the extensiveness of alternative splicing on a global level as well as the differences between molecular subtypes of breast cancer. These combined results indicate that MapSplice is a highly accurate algorithm for the alignment of RNA-seq reads to splice junctions. Software download URL: http://www.netlab.uky.edu/p/bioinfo/MapSplice.