Database Resources of the National Genomics Data Center, China National Center for Bioinformation in 2023The National Genomics Data Center (NGDC), part of the China National Center for Bioinformation (CNCB), provides a family of database resources to support global academic and industrial communities. With the explosive accumulation of multi-omics data generated at an unprecedented rate, CNCB-NGDC constantly expands and updates core database resources by big data archive, integrative analysis and value-added curation. In the past year, efforts have been devoted to integrating multiple omics data, synthesizing the growing knowledge, developing new resources and upgrading a set of major resources. Particularly, several database resources are newly developed for infectious diseases and microbiology (MPoxVR, KGCoV, ProPan), cancer-trait association (ASCancer Atlas, TWAS Atlas, Brain Catalog, CCAS) as well as tropical plants (TCOD). Importantly, given the global health threat caused by monkeypox virus and SARS-CoV-2, CNCB-NGDC has newly constructed the monkeypox virus resource, along with frequent updates of SARS-CoV-2 genome sequences, variants as well as haplotypes. All the resources and services are publicly accessible at https://ngdc.cncb.ac.cn.
The Global Landscape of SARS-CoV-2 Genomes, Variants, and Haplotypes in 2019nCoVRShuhui Song, Lina Ma, Dong Zou et al.|Genomics Proteomics & Bioinformatics|2020 On January 22, 2020, China National Center for Bioinformation (CNCB) released the 2019 Novel Coronavirus Resource (2019nCoVR), an open-access information resource for the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). 2019nCoVR features a comprehensive integration of sequence and clinical information for all publicly available SARS-CoV-2 isolates, which are manually curated with value-added annotations and quality evaluated by an automated in-house pipeline. Of particular note, 2019nCoVR offers systematic analyses to generate a dynamic landscape of SARS-CoV-2 genomic variations at a global scale. It provides all identified variants and their detailed statistics for each virus isolate, and congregates the quality score, functional annotation, and population frequency for each variant. Spatiotemporal change for each variant can be visualized and historical viral haplotype network maps for the course of the outbreak are also generated based on all complete and high-quality genomes available. Moreover, 2019nCoVR provides a full collection of SARS-CoV-2 relevant literature on the coronavirus disease 2019 (COVID-19), including published papers from PubMed as well as preprints from services such as bioRxiv and medRxiv through Europe PMC. Furthermore, by linking with relevant databases in CNCB, 2019nCoVR offers data submission services for raw sequence reads and assembled genomes, and data sharing with NCBI. Collectively, SARS-CoV-2 is updated daily to collect the latest information on genome sequences, variants, haplotypes, and literature for a timely reflection, making 2019nCoVR a valuable resource for the global research community. 2019nCoVR is accessible at https://bigd.big.ac.cn/ncov/.
ASCancer Atlas: a comprehensive knowledgebase of alternative splicing in human cancersSong Wu, Yue Huang, Mochen Zhang et al.|Nucleic Acids Research|2022 Alternative splicing (AS) is a fundamental process that governs almost all aspects of cellular functions, and dysregulation in this process has been implicated in tumor initiation, progression and treatment resistance. With accumulating studies of carcinogenic mis-splicing in cancers, there is an urgent demand to integrate cancer-associated splicing changes to better understand their internal cross-talks and functional consequences from a global view. However, a resource of key functional AS events in human cancers is still lacking. To fill the gap, we developed ASCancer Atlas (https://ngdc.cncb.ac.cn/ascancer), a comprehensive knowledgebase of aberrant splicing in human cancers. Compared to extant databases, ASCancer Atlas features a high-confidence collection of 2006 cancer-associated splicing events experimentally proved to promote tumorigenesis, a systematic splicing regulatory network, and a suit of multi-scale online analysis tools. For each event, we manually curated the functional axis including upstream splicing regulators, splicing event annotations, downstream oncogenic effects, and possible therapeutic strategies. ASCancer Atlas also houses about 2 million computationally putative splicing events. Additionally, a user-friendly web interface was built to enable users to easily browse, search, visualize, analyze, and download all splicing events. Overall, ASCancer Atlas provides a unique resource to study the functional roles of splicing dysregulation in human cancers.
scMethBank: a database for single-cell whole genome DNA methylation mapsWenting Zong, Hongen Kang, Zhuang Xiong et al.|Nucleic Acids Research|2021 Abstract Single-cell bisulfite sequencing methods are widely used to assess epigenomic heterogeneity in cell states. Over the past few years, large amounts of data have been generated and facilitated deeper understanding of the epigenetic regulation of many key biological processes including early embryonic development, cell differentiation and tumor progression. It is an urgent need to build a functional resource platform with the massive amount of data. Here, we present scMethBank, the first open access and comprehensive database dedicated to the collection, integration, analysis and visualization of single-cell DNA methylation data and metadata. Current release of scMethBank includes processed single-cell bisulfite sequencing data and curated metadata of 8328 samples derived from 15 public single-cell datasets, involving two species (human and mouse), 29 cell types and two diseases. In summary, scMethBank aims to assist researchers who are interested in cell heterogeneity to explore and utilize whole genome methylation data at single-cell level by providing browse, search, visualization, download functions and user-friendly online tools. The database is accessible at: https://ngdc.cncb.ac.cn/methbank/scm/.
MethBank 4.0: an updated database of DNA methylation across a variety of speciesMochen Zhang, Wenting Zong, Dong Zou et al.|Nucleic Acids Research|2022 DNA methylation, as the most intensively studied epigenetic mark, regulates gene expression in numerous biological processes including development, aging, and disease. With the rapid accumulation of whole-genome bisulfite sequencing data, integrating, archiving, analyzing, and visualizing those data becomes critical. Since its first publication in 2015, MethBank has been continuously updated to include more DNA methylomes across more diverse species. Here, we present MethBank 4.0 (https://ngdc.cncb.ac.cn/methbank/), which reports an increase of 309% in data volume, with 1449 single-base resolution methylomes of 23 species, covering 236 tissues/cell lines and 15 biological contexts. Value-added information, such as more rigorous quality evaluation, more standardized metadata, and comprehensive downstream annotations have been integrated in the new version. Moreover, expert-curated knowledge modules of featured differentially methylated genes associated with biological contexts and methylation analysis tools have been incorporated as new components of MethBank. In addition, MethBank 4.0 is equipped with a series of new web interfaces to browse, search, and visualize DNA methylation profiles and related information. With all these improvements, we believe the updated MethBank 4.0 will serve as a fundamental resource to provide a wide range of data services for the global research community.