EnhancerAtlas 2.0: an updated resource with enhancer annotation in 586 tissue/cell types across nine speciesTianshun Gao, Jiang Qian|Nucleic Acids Research|2019 Enhancers are distal cis-regulatory elements that activate the transcription of their target genes. They regulate a wide range of important biological functions and processes, including embryogenesis, development, and homeostasis. As more and more large-scale technologies were developed for enhancer identification, a comprehensive database is highly desirable for enhancer annotation based on various genome-wide profiling datasets across different species. Here, we present an updated database EnhancerAtlas 2.0 (http://www.enhanceratlas.org/indexv2.php), covering 586 tissue/cell types that include a large number of normal tissues, cancer cell lines, and cells at different development stages across nine species. Overall, the database contains 13 494 603 enhancers, which were obtained from 16 055 datasets using 12 high-throughput experiment methods (e.g. H3K4me1/H3K27ac, DNase-seq/ATAC-seq, P300, POLR2A, CAGE, ChIA-PET, GRO-seq, STARR-seq and MPRA). The updated version is a huge expansion of the first version, which only contains the enhancers in human cells. In addition, we predicted enhancer-target gene relationships in human, mouse and fly. Finally, the users can search enhancers and enhancer-target gene relationships through five user-friendly, interactive modules. We believe the new annotation of enhancers in EnhancerAtlas 2.0 will facilitate users to perform useful functional analysis of enhancers in various genomes.
Transcription factor and microRNA co-regulatory loops: important regulatory motifs in biological processes and diseasesHongmei Zhang, Shihuan Kuang, Xiaoming Xiong et al.|Briefings in Bioinformatics|2013 Transcription factors (TFs) and microRNAs (miRNAs) can jointly regulate target gene expression in the forms of feed-forward loops (FFLs) or feedback loops (FBLs). These regulatory loops serve as important motifs in gene regulatory networks and play critical roles in multiple biological processes and different diseases. Major progress has been made in bioinformatics and experimental study for the TF and miRNA co-regulation in recent years. To further speed up its identification and functional study, it is indispensable to make a comprehensive review. In this article, we summarize the types of FFLs and FBLs and their identified methods. Then, we review the behaviors and functions for the experimentally identified loops according to biological processes and diseases. Future improvements and challenges are also discussed, which includes more powerful bioinformatics approaches and high-throughput technologies in TF and miRNA target prediction, and the integration of networks of multiple levels.
EnhancerAtlas: a resource for enhancer annotation and analysis in 105 human cell/tissue typesTianshun Gao, Bing He, Sheng Liu et al.|Bioinformatics|2016 MOTIVATION: Multiple high-throughput approaches have recently been developed and allowed the discovery of enhancers on a genome scale in a single experiment. However, the datasets generated from these approaches are not fully utilized by the research community due to technical challenges such as lack of consensus enhancer annotation and integrative analytic tools. RESULTS: We developed an interactive database, EnhancerAtlas, which contains an atlas of 2,534,123 enhancers for 105 cell/tissue types. A consensus enhancer annotation was obtained for each cell by summation of independent experimental datasets with the relative weights derived from a cross-validation approach. Moreover, EnhancerAtlas provides a set of useful analytic tools that allow users to query and compare enhancers in a particular genomic region or associated with a gene of interest, and assign enhancers and their target genes from a custom dataset. AVAILABILITY AND IMPLEMENTATION: The database with analytic tools is available at http://www.enhanceratlas.org/ CONTACT: jiang.qian@jhmi.edu or tank1@email.chop.eduSupplementary information: Supplementary data are available at Bioinformatics online.
CPLM: a database of protein lysine modificationsZexian Liu, Yongbo Wang, Tianshun Gao et al.|Nucleic Acids Research|2013 We reported an integrated database of Compendium of Protein Lysine Modifications (CPLM; http://cplm.biocuckoo.org) for protein lysine modifications (PLMs), which occur at active ε-amino groups of specific lysine residues in proteins and are critical for orchestrating various biological processes. The CPLM database was updated from our previously developed database of Compendium of Protein Lysine Acetylation (CPLA), which contained 7151 lysine acetylation sites in 3311 proteins. Here, we manually collected experimentally identified substrates and sites for 12 types of PLMs, including acetylation, ubiquitination, sumoylation, methylation, butyrylation, crotonylation, glycation, malonylation, phosphoglycerylation, propionylation, succinylation and pupylation. In total, the CPLM database contained 203,972 modification events on 189,919 modified lysines in 45,748 proteins for 122 species. With the dataset, we totally identified 76 types of co-occurrences of various PLMs on the same lysine residues, and the most abundant PLM crosstalk is between acetylation and ubiquitination. Up to 53.5% of acetylation and 33.1% of ubiquitination events co-occur at 10 746 lysine sites. Thus, the various PLM crosstalks suggested that a considerable proportion of lysines were competitively and dynamically regulated in a complicated manner. Taken together, the CPLM database can serve as a useful resource for further research of PLMs.
UUCD: a family-based database of ubiquitin and ubiquitin-like conjugationTianshun Gao, Zexian Liu, Yongbo Wang et al.|Nucleic Acids Research|2012 In this work, we developed a family-based database of UUCD (http://uucd.biocuckoo.org) for ubiquitin and ubiquitin-like conjugation, which is one of the most important post-translational modifications responsible for regulating a variety of cellular processes, through a similar E1 (ubiquitin-activating enzyme)-E2 (ubiquitin-conjugating enzyme)-E3 (ubiquitin-protein ligase) enzyme thioester cascade. Although extensive experimental efforts have been taken, an integrative data resource is still not available. From the scientific literature, 26 E1s, 105 E2s, 1003 E3s and 148 deubiquitination enzymes (DUBs) were collected and classified into 1, 3, 19 and 7 families, respectively. To computationally characterize potential enzymes in eukaryotes, we constructed 1, 1, 15 and 6 hidden Markov model (HMM) profiles for E1s, E2s, E3s and DUBs at the family level, separately. Moreover, the ortholog searches were conducted for E3 and DUB families without HMM profiles. Then the UUCD database was developed with 738 E1s, 2937 E2s, 46 631 E3s and 6647 DUBs of 70 eukaryotic species. The detailed annotations and classifications were also provided. The online service of UUCD was implemented in PHP + MySQL + JavaScript + Perl.