Wenming Zhao

Database Resources of the BIG Data Center in 2019

Zhang Zhang, Wenming Zhao, Jingfa Xiao et al.|Nucleic Acids Research|2018

Cited by 147Open Access

The BIG Data Center at Beijing Institute of Genomics (BIG) of the Chinese Academy of Sciences provides a suite of database resources in support of worldwide research activities in both academia and industry. With the vast amounts of multi-omics data generated at unprecedented scales and rates, the BIG Data Center is continually expanding, updating and enriching its core database resources through big data integration and value-added curation. Resources with significant updates in the past year include BioProject (a biological project library), BioSample (a biological sample library), Genome Sequence Archive (GSA, a data repository for archiving raw sequence reads), Genome Warehouse (GWH, a centralized resource housing genome-scale data), Genome Variation Map (GVM, a public repository of genome variations), Science Wikis (a catalog of biological knowledge wikis for community annotations) and IC4R (Information Commons for Rice). Newly released resources include EWAS Atlas (a knowledgebase of epigenome-wide association studies), iDog (an integrated omics data resource for dog) and RNA editing resources (for editome-disease associations and plant RNA editosome, respectively). To promote biodiversity and health big data sharing around the world, the Open Biodiversity and Health Big Data (BHBD) initiative is introduced. All of these resources are publicly accessible at http://bigd.big.ac.cn.

Coronavirus GenBrowser for monitoring the transmission and evolution of SARS-CoV-2

Dalang Yu, Xiao Yang, Bixia Tang et al.|Briefings in Bioinformatics|2021

Cited by 24Open Access

Genomic epidemiology is important to study the COVID-19 pandemic, and more than two million severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genomic sequences were deposited into public databases. However, the exponential increase of sequences invokes unprecedented bioinformatic challenges. Here, we present the Coronavirus GenBrowser (CGB) based on a highly efficient analysis framework and a node-picking rendering strategy. In total, 1,002,739 high-quality genomic sequences with the transmission-related metadata were analyzed and visualized. The size of the core data file is only 12.20 MB, highly efficient for clean data sharing. Quick visualization modules and rich interactive operations are provided to explore the annotated SARS-CoV-2 evolutionary tree. CGB binary nomenclature is proposed to name each internal lineage. The pre-analyzed data can be filtered out according to the user-defined criteria to explore the transmission of SARS-CoV-2. Different evolutionary analyses can also be easily performed, such as the detection of accelerated evolution and ongoing positive selection. Moreover, the 75 genomic spots conserved in SARS-CoV-2 but non-conserved in other coronaviruses were identified, which may indicate the functional elements specifically important for SARS-CoV-2. The CGB was written in Java and JavaScript. It not only enables users who have no programming skills to analyze millions of genomic sequences, but also offers a panoramic vision of the transmission and evolution of SARS-CoV-2.

Database resources of the National Genomics Data Center, China National Center for Bioinformation in 2026

CNCB–NGDC Members and Partners, Yīmíng Bào, Zhang Zhang et al.|Nucleic Acids Research|2025

Cited by 6Open Access

The National Genomics Data Center (NGDC), as part of the China National Center for Bioinformation (CNCB), provides a suite of database resources for worldwide researchers. As multi-omics big data and artificial intelligence reshape the paradigm of biology research, CNCB-NGDC continuously updates its database resources to enhance data usability, foster knowledge discovery, and support data-driven innovative research. Over the past year, notable progress has been achieved in expanding the scope of high-quality multi-omics datasets, building new database resources, and optimizing extant core resources. Notably, the launch of BIG Search enables cross-database search services for large-scale biological data platforms, including NGDC, National Center for Biotechnology Information (NCBI), and European Bioinformatics Institute (EBI). Additionally, several new resources have been developed, covering genome and variation (Hiland Resource, TOAnnoPriDB), expression (TEDD), single-cell omics (PreDigs, scMultiModalMap, TE-SCALE), radiomics (TonguExpert), health and disease (CAVDdb, IDP, MTB-KB, ResMicroDb), biodiversity and biosynthesis (SugarcaneOmics), as well as research tools (Dingent, miMatch, OmniExtract, RDBSB, xMarkerFinder). All these resources and services are freely accessible at https://ngdc.cncb.ac.cn.

Coronavirus GenBrowser for monitoring the transmission and evolution of SARS-CoV-2

Dalang Yu, Xiao Yang, Bixia Tang et al.|medRxiv|2020

Cited by 3Open Access

Abstract Genomic epidemiology is important to study the COVID-19 pandemic and more than two million SARS-CoV-2 genomic sequences were deposited into public databases. However, the exponential increase of sequences invokes unprecedented bioinformatic challenges. Here, we present the Coronavirus GenBrowser (CGB) based on a highly efficient analysis framework and a movie maker strategy. In total, 1,002,739 high quality genomic sequences with the transmission-related metadata were analyzed and visualized. The size of the core data file is only 12.20 MB, efficient for clean data sharing. Quick visualization modules and rich interactive operations are provided to explore the annotated SARS-CoV-2 evolutionary tree. CGB binary nomenclature is proposed to name each internal lineage. The pre-analyzed data can be filtered out according to the user-defined criteria to explore the transmission of SARS-CoV-2. Different evolutionary analyses can also be easily performed, such as the detection of accelerated evolution and on-going positive selection. Moreover, the 75 genomic spots conserved in SARS-CoV-2 but non-conserved in other coronaviruses were identified, which may indicate the functional elements specifically important for SARS-CoV-2. The CGB not only enables users who have no programming skills to analyze millions of genomic sequences, but also offers a panoramic vision of the transmission and evolution of SARS-CoV-2.

Is this you? Claim your profile.

Top publicationsby citations