Yubin Sun

The Genome Sequence Archive Family: Toward Explosive Data Growth and Diverse Data Types

Tingting Chen, Xu Chen, Sisi Zhang et al.|Genomics Proteomics & Bioinformatics|2021

Cited by 1.9kOpen Access

The Genome Sequence Archive (GSA) is a data repository for archiving raw sequence data, which provides data storage and sharing services for worldwide scientific communities. Considering explosive data growth with diverse data types, here we present the GSA family by expanding into a set of resources for raw data archive with different purposes, namely, GSA (https://ngdc.cncb.ac.cn/gsa/), GSA for Human (GSA-Human, https://ngdc.cncb.ac.cn/gsa-human/), and Open Archive for Miscellaneous Data (OMIX, https://ngdc.cncb.ac.cn/omix/). Compared with the 2017 version, GSA has been significantly updated in data model, online functionalities, and web interfaces. GSA-Human, as a new partner of GSA, is a data repository specialized in human genetics-related data with controlled access and security. OMIX, as a critical complement to the two resources mentioned above, is an open archive for miscellaneous data. Together, all these resources form a family of resources dedicated to archiving explosive data with diverse types, accepting data submissions from all over the world, and providing free open access to all publicly available data in support of worldwide research activities.

Database Resources of the National Genomics Data Center, China National Center for Bioinformation in 2022

CNCB-NGDC Members and Partners, Yongbiao Xue, Yīmíng Bào et al.|Nucleic Acids Research|2021

Cited by 1.1kOpen Access

The National Genomics Data Center (NGDC), part of the China National Center for Bioinformation (CNCB), provides a family of database resources to support global research in both academia and industry. With the explosively accumulated multi-omics data at ever-faster rates, CNCB-NGDC is constantly scaling up and updating its core database resources through big data archive, curation, integration and analysis. In the past year, efforts have been made to synthesize the growing data and knowledge, particularly in single-cell omics and precision medicine research, and a series of resources have been newly developed, updated and enhanced. Moreover, CNCB-NGDC has continued to daily update SARS-CoV-2 genome sequences, variants, haplotypes and literature. Particularly, OpenLB, an open library of bioscience, has been established by providing easy and open access to a substantial number of abstract texts from PubMed, bioRxiv and medRxiv. In addition, Database Commons is significantly updated by cataloguing a full list of global databases, and BLAST tools are newly deployed to provide online sequence search services. All these resources along with their services are publicly accessible at https://ngdc.cncb.ac.cn.

GSA: Genome Sequence Archive

Yanqing Wang, Fuhai Song, Junwei Zhu et al.|Genomics Proteomics & Bioinformatics|2017

Cited by 833Open Access

With the rapid development of sequencing technologies towards higher throughput and lower cost, sequence data are generated at an unprecedentedly explosive rate. To provide an efficient and easy-to-use platform for managing huge sequence data, here we present Genome Sequence Archive (GSA; http://bigd.big.ac.cn/gsa or http://gsa.big.ac.cn), a data repository for archiving raw sequence data. In compliance with data standards and structures of the International Nucleotide Sequence Database Collaboration (INSDC), GSA adopts four data objects (BioProject, BioSample, Experiment, and Run) for data organization, accepts raw sequence reads produced by a variety of sequencing platforms, stores both sequence reads and metadata submitted from all over the world, and makes all these data publicly available to worldwide scientific communities. In the era of big data, GSA is not only an important complement to existing INSDC members by alleviating the increasing burdens of handling sequence data deluge, but also takes the significant responsibility for global big data archive and provides free unrestricted access to all publicly available data in support of research activities throughout the world.

The GSA Family in 2025: A Broadened Sharing Platform for Multi-omics and Multimodal Data

Sisi Zhang, Xu Chen, Enhui Jin et al.|Genomics Proteomics & Bioinformatics|2025

Cited by 166Open Access

The Genome Sequence Archive family (GSA family) provides a comprehensive suite of database resources for archiving, retrieving, and sharing multi-omics data for the global academic and industrial communities. It currently comprises four distinct database members: the Genome Sequence Archive (GSA, https://ngdc.cncb.ac.cn/gsa), the Genome Sequence Archive for Human (GSA-Human, https://ngdc.cncb.ac.cn/gsa-human), the Open Archive for Miscellaneous Data (OMIX, https://ngdc.cncb.ac.cn/omix), and the Open Biomedical Imaging Archive (OBIA, https://ngdc.cncb.ac.cn/obia). Compared to its 2021 version, the GSA family has expanded significantly by introducing a new repository, the OBIA, and by comprehensively upgrading the existing databases. Notable enhancements to the existing members include broadening the range of accepted data types, strengthening quality control systems, improving the data retrieval system, and refining data-sharing management mechanisms.

Database Resources of the BIG Data Center in 2018

Xingjian Xu, Lili Hao, Junwei Zhu et al.|Nucleic Acids Research|2017

Cited by 159Open Access

The BIG Data Center at Beijing Institute of Genomics (BIG) of the Chinese Academy of Sciences provides freely open access to a suite of database resources in support of worldwide research activities in both academia and industry. With the vast amounts of omics data generated at ever-greater scales and rates, the BIG Data Center is continually expanding, updating and enriching its core database resources through big-data integration and value-added curation, including BioCode (a repository archiving bioinformatics tool codes), BioProject (a biological project library), BioSample (a biological sample library), Genome Sequence Archive (GSA, a data repository for archiving raw sequence reads), Genome Warehouse (GWH, a centralized resource housing genome-scale data), Genome Variation Map (GVM, a public repository of genome variations), Gene Expression Nebulas (GEN, a database of gene expression profiles based on RNA-Seq data), Methylation Bank (MethBank, an integrated databank of DNA methylomes), and Science Wikis (a series of biological knowledge wikis for community annotations). In addition, three featured web services are provided, viz., BIG Search (search as a service; a scalable inter-domain text search engine), BIG SSO (single sign-on as a service; a user access control system to gain access to multiple independent systems with a single ID and password) and Gsub (submission as a service; a unified submission service for all relevant resources). All of these resources are publicly accessible through the home page of the BIG Data Center at http://bigd.big.ac.cn.

Is this you? Claim your profile.

Top publicationsby citations