Chunman Zuo

Deep-joint-learning analysis model of single cell transcriptome and open chromatin accessibility data

Chunman Zuo, Luonan Chen|Briefings in Bioinformatics|2020

Cited by 147Open Access

Simultaneous profiling transcriptomic and chromatin accessibility information in the same individual cells offers an unprecedented resolution to understand cell states. However, computationally effective methods for the integration of these inherent sparse and heterogeneous data are lacking. Here, we present a single-cell multimodal variational autoencoder model, which combines three types of joint-learning strategies with a probabilistic Gaussian Mixture Model to learn the joint latent features that accurately represent these multilayer profiles. Studies on both simulated datasets and real datasets demonstrate that it has more preferable capability (i) dissecting cellular heterogeneity in the joint-learning space, (ii) denoising and imputing data and (iii) constructing the association between multilayer omics data, which can be used for understanding transcriptional regulatory mechanisms.

Deep cross-omics cycle attention model for joint analysis of single-cell multi-omics data

Chunman Zuo, Hao Dai, Luonan Chen|Bioinformatics|2021

Cited by 92

MOTIVATION: Joint profiling of single-cell transcriptomics and epigenomics data enables us to characterize cell states and transcriptomics regulatory programs related to cellular heterogeneity. However, the highly different features on sparsity, heterogeneity and dimensionality between multi-omics data have severely hindered its integrative analysis. RESULTS: We proposed deep cross-omics cycle attention (DCCA) model, a computational tool for joint analysis of single-cell multi-omics data, by combining variational autoencoders (VAEs) and attention-transfer. Specifically, we show that DCCA can leverage one omics data to fine-tune the network trained for another omics data, given a dataset of parallel multi-omics data within the same cell. Studies on both simulated and real datasets from various platforms, DCCA demonstrates its superior capability: (i) dissecting cellular heterogeneity; (ii) denoising and aggregating data and (iii) constructing the link between multi-omics data, which is used to infer new transcriptional regulatory relations. In our applications, DCCA was demonstrated to have a superior power to generate missing stages or omics in a biologically meaningful manner, which provides a new way to analyze and also understand complicated biological processes. AVAILABILITY AND IMPLEMENTATION: DCCA source code is available at https://github.com/cmzuo11/DCCA, and has been deposited in archived format at https://doi.org/10.5281/zenodo.4762065. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Revealing the transcriptomic complexity of switchgrass by PacBio long-read sequencing

Chunman Zuo, Matthew J. Blow, Avinash Sreedasyam et al.|Biotechnology for Biofuels|2018

Cited by 69Open Access

BACKGROUND: L.) is an important bioenergy crop widely used for lignocellulosic research. While extensive transcriptomic analyses have been conducted on this species using short read-based sequencing techniques, very little has been reliably derived regarding alternatively spliced (AS) transcripts. RESULTS: We present an analysis of transcriptomes of six switchgrass tissue types pooled together, sequenced using Pacific Biosciences (PacBio) single-molecular long-read technology. Our analysis identified 105,419 unique transcripts covering 43,570 known genes and 8795 previously unknown genes. 45,168 are novel transcripts of known genes. A total of 60,096 AS transcripts are identified, 45,628 being novel. We have also predicted 1549 transcripts of genes involved in cell wall construction and remodeling, 639 being novel transcripts of known cell wall genes. Most of the predicted transcripts are validated against Illumina-based short reads. Specifically, 96% of the splice junction sites in all the unique transcripts are validated by at least five Illumina reads. Comparisons between genes derived from our identified transcripts and the current genome annotation revealed that among the gene set predicted by both analyses, 16,640 have different exon-intron structures. CONCLUSIONS: Overall, substantial amount of new information is derived from the PacBio RNA data regarding both the transcriptome and the genome of switchgrass.

Classifying Breast Cancer Subtypes Using Multiple Kernel Learning Based on Omics Data

Mingxin Tao, Tianci Song, Wei Du et al.|Genes|2019

Cited by 61Open Access

It is very significant to explore the intrinsic differences in breast cancer subtypes. These intrinsic differences are closely related to clinical diagnosis and designation of treatment plans. With the accumulation of biological and medicine datasets, there are many different omics data that can be viewed in different aspects. Combining these multiple omics data can improve the accuracy of prediction. Meanwhile; there are also many different databases available for us to download different types of omics data. In this article, we use estrogen receptor (ER), progesterone receptor (PR), human epidermal growth factor receptor 2 (HER2) to define breast cancer subtypes and classify any two breast cancer subtypes using SMO-MKL algorithm. We collected mRNA data, methylation data and copy number variation (CNV) data from TCGA to classify breast cancer subtypes. Multiple Kernel Learning (MKL) is employed to use these omics data distinctly. The result of using three omics data with multiple kernels is better than that of using single omics data with multiple kernels. Furthermore; these significant genes and pathways discovered in the feature selection process are also analyzed. In experiments; the proposed method outperforms other state-of-the-art methods and has abundant biological interpretations.

Unravelling tumour spatiotemporal heterogeneity using spatial multimodal data

Chunman Zuo, Junchao Zhu, Jiawei Zou et al.|Clinical and Translational Medicine|2025

Cited by 13Open Access

Analysing the genome, epigenome, transcriptome, proteome, and metabolome within the spatial context of cells has transformed our understanding of tumour spatiotemporal heterogeneity. Advances in spatial multi-omics technologies now reveal complex molecular interactions shaping cellular behaviour and tissue dynamics. This review highlights key technologies and computational methods that have advanced spatial domain identification and their pseudo-relations, as well as inference of intra- and inter-cellular molecular networks that drive disease progression. We also discuss strategies to address major challenges, including data sparsity, high-dimensionality, scalability, and heterogeneity. Furthermore, we outline how spatial multi-omics enables novel insights into disease mechanisms, advancing precision medicine and informing targeted therapies. KEY POINTS: Advancements in spatial multi-omics facilitate our understanding of tumour spatiotemporal heterogeneity. AI-driven multimodal models uncover complex molecular interactions that underlie cellular behaviours and tissue dynamics. Combining multi-omics technologies and AI-enabled bioinformatics tools helps predict critical disease stages, such as pre-cancer, advancing precision medicine, and informing targeted therapeutic strategies.

Is this you? Claim your profile.

Top publicationsby citations