Large Scale Foundation Model on Single-cell Transcriptomics

Minsheng Hao(Tsinghua University), Jing Gong, Xin Zeng, Chi-Ming Liu, Yucheng Guo, Xingyi Cheng, Taifeng Wang, Jianzhu Ma(Tsinghua University), Le Song(Mohamed bin Zayed University of Artificial Intelligence), Xuegong Zhang(Tsinghua University)
bioRxiv (Cold Spring Harbor Laboratory)
May 31, 2023
Cited by 113Open Access
Full Text

Abstract

Abstract Large-scale pretrained models have become foundation models leading to breakthroughs in natural language processing and related fields. Developing foundation models in life science for deciphering the “languages” of cells and facilitating biomedical research is promising yet challenging. We developed a large-scale pretrained model scFoundation with 100M parameters for this purpose. scFoundation was trained on over 50 million human single-cell transcriptomics data, which contain high-throughput observations on the complex molecular features in all known types of cells. scFoundation is currently the largest model in terms of the size of trainable parameters, dimensionality of genes and the number of cells used in the pre-training. Experiments showed that scFoundation can serve as a foundation model for single-cell transcriptomics and achieve state-of-the-art performances in a diverse array of downstream tasks, such as gene expression enhancement, tissue drug response prediction, single-cell drug response classification, and single-cell perturbation prediction.


Related Papers

No related papers found

Powered by citation graph analysis