Learning interpretable cellular and gene signature embeddings from single-cell transcriptomic data

Yifan Zhao; Huiyu Cai; Zuobai Zhang; Jian Tang; Yue Li

doi:10.1038/s41467-021-25534-2

Learning interpretable cellular and gene signature embeddings from single-cell transcriptomic data

Yifan Zhao(Harvard–MIT Division of Health Sciences and Technology), Huiyu Cai(Peking University), Zuobai Zhang(Fudan University), Jian Tang(HEC Montréal), Yue Li(McGill University)

Nature Communications

September 6, 2021

10.1038/s41467-021-25534-2

Cited by 93Open Access

Full Text

Abstract

Abstract The advent of single-cell RNA sequencing (scRNA-seq) technologies has revolutionized transcriptomic studies. However, large-scale integrative analysis of scRNA-seq data remains a challenge largely due to unwanted batch effects and the limited transferabilty, interpretability, and scalability of the existing computational methods. We present single-cell Embedded Topic Model (scETM). Our key contribution is the utilization of a transferable neural-network-based encoder while having an interpretable linear decoder via a matrix tri-factorization. In particular, scETM simultaneously learns an encoder network to infer cell type mixture and a set of highly interpretable gene embeddings, topic embeddings, and batch-effect linear intercepts from multiple scRNA-seq datasets. scETM is scalable to over 10 6 cells and confers remarkable cross-tissue and cross-species zero-shot transfer-learning performance. Using gene set enrichment analysis, we find that scETM-learned topics are enriched in biologically meaningful and disease-related pathways. Lastly, scETM enables the incorporation of known gene sets into the gene embeddings, thereby directly learning the associations between pathways and topics via the topic embeddings.

Related Papers

No related papers found

Powered by citation graph analysis