A computational framework to trace tumor tissue-of-origin of 19 cancer types based on RNA sequencing
Abstract
Abstract Carcinoma of unknown primary (CUP) is a type of metastatic cancer with tissue-of-origin (TOO) unidentifiable by traditional methods. Most CUP patients have poor prognosis since no therapy targeting TOO is allowed. Thus, it’s critical to develop accurate computational methods to infer TOO. While qPCR or microarray-based methods are effective in predicting TOO for most cancer types, the overall prediction accuracy is yet to be improved. Here, we propose a computational framework to trace TOO of 19 cancer types based on RNA sequencing (RNA-seq). Specifically, we download the RNA-seq data of 7000+ tissue samples covering 19 cancer types with known TOO from TCGA. By feature selection, 90 genes are finally selected to train a random forest model for TOO inference; the 90 genes are enriched in both tissue-specific functions and tissue-general functions. The cross-validation accuracy of our framework reaches 97.55% across all cancer types. Furthermore, we collected an independent cohort of samples in GEO as testing samples. The accuracy on the independent data is 74% despite the differences in experiment procedures and pipelines. In conclusion, we develop an accurate yet robust computational framework for identifying TOO, which might be promising in clinical applications.
Related Papers
No related papers found
Powered by citation graph analysis