A computational framework to trace tumor tissue-of-origin of 19 cancer types based on RNA sequencing

Bo Wang, Huandong Yang(Yidu Central Hospital of Weifang), Yanxiang Zhang, Tian Geng, Jialiang Yang
Research Square
March 18, 2022
Cited by 2Open Access
Full Text

Abstract

Abstract Carcinoma of unknown primary (CUP) is a type of metastatic cancer with tissue-of-origin (TOO) unidentifiable by traditional methods. Most CUP patients have poor prognosis since no therapy targeting TOO is allowed. Thus, it’s critical to develop accurate computational methods to infer TOO. While qPCR or microarray-based methods are effective in predicting TOO for most cancer types, the overall prediction accuracy is yet to be improved. Here, we propose a computational framework to trace TOO of 19 cancer types based on RNA sequencing (RNA-seq). Specifically, we download the RNA-seq data of 7000+ tissue samples covering 19 cancer types with known TOO from TCGA. By feature selection, 90 genes are finally selected to train a random forest model for TOO inference; the 90 genes are enriched in both tissue-specific functions and tissue-general functions. The cross-validation accuracy of our framework reaches 97.55% across all cancer types. Furthermore, we collected an independent cohort of samples in GEO as testing samples. The accuracy on the independent data is 74% despite the differences in experiment procedures and pipelines. In conclusion, we develop an accurate yet robust computational framework for identifying TOO, which might be promising in clinical applications.


Related Papers

No related papers found

Powered by citation graph analysis