Database Resources of the National Genomics Data Center, China National Center for Bioinformation in 2021The National Genomics Data Center (NGDC), part of the China National Center for Bioinformation (CNCB), provides a suite of database resources to support worldwide research activities in both academia and industry. With the explosive growth of multi-omics data, CNCB-NGDC is continually expanding, updating and enriching its core database resources through big data deposition, integration and translation. In the past year, considerable efforts have been devoted to 2019nCoVR, a newly established resource providing a global landscape of SARS-CoV-2 genomic sequences, variants, and haplotypes, as well as Aging Atlas, BrainBase, GTDB (Glycosyltransferases Database), LncExpDB, and TransCirc (Translation potential for circular RNAs). Meanwhile, a series of resources have been updated and improved, including BioProject, BioSample, GWH (Genome Warehouse), GVM (Genome Variation Map), GEN (Gene Expression Nebulas) as well as several biodiversity and plant resources. Particularly, BIG Search, a scalable, one-stop, cross-database search engine, has been significantly updated by providing easy access to a large number of internal and external biological resources from CNCB-NGDC, our partners, EBI and NCBI. All of these resources along with their services are publicly accessible at https://bigd.big.ac.cn.
Database Resources of the National Genomics Data Center, China National Center for Bioinformation in 2023The National Genomics Data Center (NGDC), part of the China National Center for Bioinformation (CNCB), provides a family of database resources to support global academic and industrial communities. With the explosive accumulation of multi-omics data generated at an unprecedented rate, CNCB-NGDC constantly expands and updates core database resources by big data archive, integrative analysis and value-added curation. In the past year, efforts have been devoted to integrating multiple omics data, synthesizing the growing knowledge, developing new resources and upgrading a set of major resources. Particularly, several database resources are newly developed for infectious diseases and microbiology (MPoxVR, KGCoV, ProPan), cancer-trait association (ASCancer Atlas, TWAS Atlas, Brain Catalog, CCAS) as well as tropical plants (TCOD). Importantly, given the global health threat caused by monkeypox virus and SARS-CoV-2, CNCB-NGDC has newly constructed the monkeypox virus resource, along with frequent updates of SARS-CoV-2 genome sequences, variants as well as haplotypes. All the resources and services are publicly accessible at https://ngdc.cncb.ac.cn.
EWAS Open Platform: integrated data, knowledge and toolkit for epigenome-wide association studyZhuang Xiong, Fei Yang, Mengwei Li et al.|Nucleic Acids Research|2021 Epigenome-Wide Association Study (EWAS) has become a standard strategy to discover DNA methylation variation of different phenotypes. Since 2018, we have developed EWAS Atlas and EWAS Data Hub to integrate a growing volume of EWAS knowledge and data, respectively. Here, we present EWAS Open Platform (https://ngdc.cncb.ac.cn/ewas) that includes EWAS Atlas, EWAS Data Hub and the newly developed EWAS Toolkit. In the current implementation, EWAS Open Platform integrates 617 018 high-quality EWAS associations from 910 publications, covering 51 phenotypes, 275 diseases and 104 environmental factors. It also provides well-normalized DNA methylation array data and the corresponding metadata from 115 852 samples, which involve 707 tissues, 218 cell lines and 528 diseases. Taking advantage of integrated knowledge and data in EWAS Atlas and EWAS Data Hub, EWAS Open Platform equips with EWAS Toolkit, a powerful one-stop site for EWAS enrichment, annotation, and knowledge network construction and visualization. Collectively, EWAS Open Platform provides open access to EWAS knowledge, data and toolkit and thus bears great utility for a broader range of relevant research.
Cell Taxonomy: a curated repository of cell types with multifaceted characterizationShuai Jiang, Qiheng Qian, Tongtong Zhu et al.|Nucleic Acids Research|2022 Single-cell studies have delineated cellular diversity and uncovered increasing numbers of previously uncharacterized cell types in complex tissues. Thus, synthesizing growing knowledge of cellular characteristics is critical for dissecting cellular heterogeneity, developmental processes and tumorigenesis at single-cell resolution. Here, we present Cell Taxonomy (https://ngdc.cncb.ac.cn/celltaxonomy), a comprehensive and curated repository of cell types and associated cell markers encompassing a wide range of species, tissues and conditions. Combined with literature curation and data integration, the current version of Cell Taxonomy establishes a well-structured taxonomy for 3,143 cell types and houses a comprehensive collection of 26,613 associated cell markers in 257 conditions and 387 tissues across 34 species. Based on 4,299 publications and single-cell transcriptomic profiles of ∼3.5 million cells, Cell Taxonomy features multifaceted characterization for cell types and cell markers, involving quality assessment of cell markers and cell clusters, cross-species comparison, cell composition of tissues and cellular similarity based on markers. Taken together, Cell Taxonomy represents a fundamentally useful reference to systematically and accurately characterize cell types and thus lays an important foundation for deeply understanding and exploring cellular biology in diverse species.
ASCancer Atlas: a comprehensive knowledgebase of alternative splicing in human cancersSong Wu, Yue Huang, Mochen Zhang et al.|Nucleic Acids Research|2022 Alternative splicing (AS) is a fundamental process that governs almost all aspects of cellular functions, and dysregulation in this process has been implicated in tumor initiation, progression and treatment resistance. With accumulating studies of carcinogenic mis-splicing in cancers, there is an urgent demand to integrate cancer-associated splicing changes to better understand their internal cross-talks and functional consequences from a global view. However, a resource of key functional AS events in human cancers is still lacking. To fill the gap, we developed ASCancer Atlas (https://ngdc.cncb.ac.cn/ascancer), a comprehensive knowledgebase of aberrant splicing in human cancers. Compared to extant databases, ASCancer Atlas features a high-confidence collection of 2006 cancer-associated splicing events experimentally proved to promote tumorigenesis, a systematic splicing regulatory network, and a suit of multi-scale online analysis tools. For each event, we manually curated the functional axis including upstream splicing regulators, splicing event annotations, downstream oncogenic effects, and possible therapeutic strategies. ASCancer Atlas also houses about 2 million computationally putative splicing events. Additionally, a user-friendly web interface was built to enable users to easily browse, search, visualize, analyze, and download all splicing events. Overall, ASCancer Atlas provides a unique resource to study the functional roles of splicing dysregulation in human cancers.