GSA: Genome Sequence ArchiveYanqing Wang, Fuhai Song, Junwei Zhu et al.|Genomics Proteomics & Bioinformatics|2017 With the rapid development of sequencing technologies towards higher throughput and lower cost, sequence data are generated at an unprecedentedly explosive rate. To provide an efficient and easy-to-use platform for managing huge sequence data, here we present Genome Sequence Archive (GSA; http://bigd.big.ac.cn/gsa or http://gsa.big.ac.cn), a data repository for archiving raw sequence data. In compliance with data standards and structures of the International Nucleotide Sequence Database Collaboration (INSDC), GSA adopts four data objects (BioProject, BioSample, Experiment, and Run) for data organization, accepts raw sequence reads produced by a variety of sequencing platforms, stores both sequence reads and metadata submitted from all over the world, and makes all these data publicly available to worldwide scientific communities. In the era of big data, GSA is not only an important complement to existing INSDC members by alleviating the increasing burdens of handling sequence data deluge, but also takes the significant responsibility for global big data archive and provides free unrestricted access to all publicly available data in support of research activities throughout the world.
Database Resources of the BIG Data Center in 2018Xingjian Xu, Lili Hao, Junwei Zhu et al.|Nucleic Acids Research|2017 The BIG Data Center at Beijing Institute of Genomics (BIG) of the Chinese Academy of Sciences provides freely open access to a suite of database resources in support of worldwide research activities in both academia and industry. With the vast amounts of omics data generated at ever-greater scales and rates, the BIG Data Center is continually expanding, updating and enriching its core database resources through big-data integration and value-added curation, including BioCode (a repository archiving bioinformatics tool codes), BioProject (a biological project library), BioSample (a biological sample library), Genome Sequence Archive (GSA, a data repository for archiving raw sequence reads), Genome Warehouse (GWH, a centralized resource housing genome-scale data), Genome Variation Map (GVM, a public repository of genome variations), Gene Expression Nebulas (GEN, a database of gene expression profiles based on RNA-Seq data), Methylation Bank (MethBank, an integrated databank of DNA methylomes), and Science Wikis (a series of biological knowledge wikis for community annotations). In addition, three featured web services are provided, viz., BIG Search (search as a service; a scalable inter-domain text search engine), BIG SSO (single sign-on as a service; a user access control system to gain access to multiple independent systems with a single ID and password) and Gsub (submission as a service; a unified submission service for all relevant resources). All of these resources are publicly accessible through the home page of the BIG Data Center at http://bigd.big.ac.cn.
AlzBase: an Integrative Database for Gene Dysregulation in Alzheimer’s DiseaseZhouxian Bai, Guangchun Han, Bin Xie et al.|Molecular Neurobiology|2014 Hidden Risk Genes with High-Order Intragenic Epistasis in Alzheimer's DiseaseJiya Sun, Fuhai Song, Jiajia Wang et al.|Journal of Alzheimer s Disease|2014 Meta-analysis of data from genome-wide association studies (GWAS) of Alzheimer's disease (AD) has confirmed the high risk of APOE and identified twenty other risk genes/loci with moderate effect size. However, many more risk genes/loci remain to be discovered to account for the missing heritability. The contributions from individual singe-nucleotide polymorphisms (SNPs) have been thoroughly examined in traditional GWAS data analysis, while SNP-SNP interactions can be explored by a variety of alternative approaches. Here we applied generalized multifactor dimensionality reduction to the re-analysis of four publicly available GWAS datasets for AD. When considering 4-order intragenic SNP interactions, we observed high consistency of discovered potential risk genes among the four independent GWAS datasets. Ten potential risk genes were observed across all four datasets, including PDE1A, RYR3, TEK, SLC25A21, LOC729852, KIRREL3, PTPN5, FSHR, PARK2, and NR3C2. These potential risk genes discovered by generalized multifactor dimensionality reduction are highly relevant to AD pathogenesis based on multiple layers of evidence. The genetic contributions of these genes warrant further confirmation in other independent GWAS datasets for AD.
Replanting Affects the Tree Growth and Fruit Quality of Gala AppleEn-tai LIU, Gong-shuai WANG, Yuanyuan Li et al.|Journal of Integrative Agriculture|2014 Apple replant disease (ARD) causes the inhibition of root system development, stunts tree growth and so on. To further investigate the effects of ARD on apple fruits, a 25-year-old apple orchard was remediated to establish a replant orchard between November 2008 and March 2009. A rotational cropping orchard was established on an adjacent wheat field. The cultivar and rootstock-scion combination used in the newly established orchards was Royal Gala/M26/Malus hupehensis Rehd. Ripe fruits were collected in mid-August 2011 and mid-August 2012, meanwhile, the following indices were measured: yield per plant; fruit weight; the fruit shape index; the contents of anthocyanin, carotenoid and chlorophyll; the soluble sugar content in the flesh; titratable acid; the sugar-acid ratio; firmness; and aroma components; apple plant ground diameter, plant height increment and the total length of the current-year shoots. The results showed that compared to rotational cropping, continuous cropping yielded statistically significant reductions in fruit weight and yield per plant of 39.8 and 76.5%, respectively. However, there were no changes in the fruit shape index. The anthocyanin and carotenoid contents decreased by 81.7 and 37.7%, respectively, while the chlorophyll content increased by 251.0%. All of these differences in content were statistically significant. The soluble sugar levels and sugar-acid ratio decreased by 25.4 and 60.9%, respectively, but the titratable acid levels and fruit firmness increased by 90.9 and 42.8%, respectively. Ten of the most important esters contributing to the apple aroma were analyzed, and the following changes were observed: hexyl acetate, butyl acetate, hexyl butyrate, acetate-2-methyl butyl, 2-methyl-hexyl butyrate, amyl acetate, butyl butyrate, 2-methyl-butyl butyrate, hexyl propionate and hexyl hexanoate decreased by 25.5, 78.4, 89.1, 55.5, 79.5, 77.2, 86.8, 69.9, 61.2, and 68.1%, respectively. The contents of three other aroma components, (E)-2-hexenal, hexanal and 1-hexanol, significantly increased. Eight characteristic aroma components were found in the rotational cropping fruits: hexyl acetate, butyl acetate, acetate-2-methyl butyl, 2-methyl-hexyl butyrate, amyl acetate, 2-methyl- butyl butyrate, hexyl acetate and hexyl propionate. There were four characteristic ester components (hexyl acetate, butyl acetate, acetate-2-methyl butyl, 2-methyl-hexyl butyrate) and two characteristic aldehyde aroma components ((E)-2-hexenal and hexanal) in the continuous cropping fruits. Compared with the rotational cropping fruits, four characteristic ester components were declined and two characteristic aldehyde aroma components were increased. Compared with the control, replanted apple plant ground diameter, plant height increment and the total length of the current-year shoots were reduced by 27.6, 40.6 and 72.2%, respectively.