Database Resources of the National Genomics Data Center in 2020Zhang Zhang, Wenming Zhao, Jingfa Xiao et al.|Nucleic Acids Research|2019 The National Genomics Data Center (NGDC) provides a suite of database resources to support worldwide research activities in both academia and industry. With the rapid advancements in higher-throughput and lower-cost sequencing technologies and accordingly the huge volume of multi-omics data generated at exponential scales and rates, NGDC is continually expanding, updating and enriching its core database resources through big data integration and value-added curation. In the past year, efforts for update have been mainly devoted to BioProject, BioSample, GSA, GWH, GVM, NONCODE, LncBook, EWAS Atlas and IC4R. Newly released resources include three human genome databases (PGG.SNV, PGG.Han and CGVD), eLMSG, EWAS Data Hub, GWAS Atlas, iSheep and PADS Arsenal. In addition, four web services, namely, eGPS Cloud, BIG Search, BIG Submission and BIG SSO, have been significantly improved and enhanced. All of these resources along with their services are publicly accessible at https://bigd.big.ac.cn.
Computational Approaches and Challenges in Spatial TranscriptomicsShuangsang Fang, Bichao Chen, Yong Zhang et al.|Genomics Proteomics & Bioinformatics|2022 The development of spatial transcriptomics (ST) technologies has transformed genetic research from a single-cell data level to a two-dimensional spatial coordinate system and facilitated the study of the composition and function of various cell subsets in different environments and organs. The large-scale data generated by these ST technologies, which contain spatial gene expression information, have elicited the need for spatially resolved approaches to meet the requirements of computational and biological data interpretation. These requirements include dealing with the explosive growth of data to determine the cell-level and gene-level expression, correcting the inner batch effect and loss of expression to improve the data quality, conducting efficient interpretation and in-depth knowledge mining both at the single-cell and tissue-wide levels, and conducting multi-omics integration analysis to provide an extensible framework toward the in-depth understanding of biological processes. However, algorithms designed specifically for ST technologies to meet these requirements are still in their infancy. Here, we review computational approaches to these problems in light of corresponding issues and challenges, and present forward-looking insights into algorithm development.
SAW: an efficient and accurate data analysis workflow for Stereo-seq spatial transcriptomicsThe basic analysis steps of spatial transcriptomics require obtaining gene expression information from both space and cells. The existing tools for these analyses incur performance issues when dealing with large datasets. These issues involve computationally intensive spatial localization, RNA genome alignment, and excessive memory usage in large chip scenarios. These problems affect the applicability and efficiency of the analysis. Here, a high-performance and accurate spatial transcriptomics data analysis workflow, called Stereo-seq Analysis Workflow (SAW), was developed for the Stereo-seq technology developed at BGI. SAW includes mRNA spatial position reconstruction, genome alignment, gene expression matrix generation, and clustering. The workflow outputs files in a universal format for subsequent personalized analysis. The execution time for the entire analysis is ∼148 min with 1 GB reads 1 × 1 cm chip test data, 1.8 times faster than with an unoptimized workflow.
Stereopy: modeling comparative and spatiotemporal cellular heterogeneity via multi-sample spatial transcriptomicsShuangsang Fang, Mengyang Xu, Lei Cao et al.|Nature Communications|2025 Understanding complex biological systems requires tracing cellular dynamic changes across conditions, time, and space. However, integrating multi-sample data in a unified way to explore cellular heterogeneity remains challenging. Here, we present Stereopy, a flexible framework for modeling and dissecting comparative and spatiotemporal patterns in multi-sample spatial transcriptomics with interactive data visualization. To optimize this framework, we devise a universal container, a scope controller, and an integrative transformer tailored for multi-sample multimodal data storage, management, and processing. Stereopy showcases three representative applications: investigating specific cell communities and genes responsible for pathological changes, detecting spatiotemporal gene patterns by considering spatial and temporal features, and inferring three-dimensional niche-based cell-gene interaction network that bridges intercellular communications and intracellular regulations. Stereopy serves as both a comprehensive bioinformatics toolbox and an extensible framework that empowers researchers with enhanced data interpretation abilities and new perspectives for mining multi-sample spatial transcriptomics data. Tracing cellular changes in complex biological systems is challenging. Here, authors present a flexible framework that integrates multi-sample data with in-house algorithms to infer comparative and spatiotemporal cell-gene patterns, advancing understanding of cellular dynamics.
Stereopy: modeling comparative and spatiotemporal cellular heterogeneity via multi-sample spatial transcriptomicsShuangsang Fang, Mengyang Xu, Lei Cao et al.|bioRxiv (Cold Spring Harbor Laboratory)|2023 Abstract Tracing cellular dynamic changes across conditions, time, and space is crucial for understanding the molecular mechanisms underlying complex biological systems. However, integrating multi-sample data in a unified and flexible way to explore cellular heterogeneity remains a major challenge. Here, we present Stereopy, a flexible and versatile framework for modeling and dissecting comparative and spatiotemporal patterns in multi-sample spatial transcriptomics with interactive data visualization. To optimize this flexible framework, we have developed three key components: a multi-sample tailored data container, a scope controller, and an analysis transformer. Furthermore, Stereopy showcases three transformative applications supported by pivotal algorithms. Firstly, the multi-sample cell community detection (CCD) algorithm introduces an innovative capability to detect specific cell communities and identify genes responsible for pathological changes in comparable datasets. Secondly, the spatially resolved temporal gene pattern inference (TGPI) algorithm represents a notable advancement in detecting important spatiotemporal gene patterns while concurrently considering spatial and temporal features, which enhances the identification of important genes, domains and regulatory factors closely associated with temporal datasets. Finally, the 3D niche-based regulation inference tool, named NicheReg3D, reconstructs the 3D cell niches to enable the inference of cell-gene interaction network within the spatial texture, thus bridging intercellular communications and intracellular regulations to unravel the intricate regulatory mechanisms that govern cellular behavior. Overall, Stereopy serves as both a bioinformatics toolbox and an extensible framework that provides researchers with enhanced data interpretation abilities and new perspectives for mining multi-sample spatial transcriptomics data.