Beijing University of Technology
ORCID: 0000-0001-6046-8420Publishes on Cancer-related molecular mechanisms research, RNA modifications and cancer, RNA Research and Splicing. 334 papers and 24.5k citations.
Add your photo, update your bio, and get notified when your ranking changes.
It is a challenge to classify protein-coding or non-coding transcripts, especially those re-constructed from high-throughput sequencing data of poorly annotated species. This study developed and evaluated a powerful signature tool, Coding-Non-Coding Index (CNCI), by profiling adjoining nucleotide triplets to effectively distinguish protein-coding and non-coding sequences independent of known annotations. CNCI is effective for classifying incomplete transcripts and sense-antisense pairs. The implementation of CNCI offered highly accurate classification of transcripts assembled from whole-transcriptome sequencing data in a cross-species manner, that demonstrated gene evolutionary divergence between vertebrates, and invertebrates, or between plants, and provided a long non-coding RNA catalog of orangutan. CNCI software is available at http://www.bioinfo.org/software/cnci.
Gene set enrichment (GSE) analysis plays an essential role in extracting biological insight from genome-scale experiments. ORA (overrepresentation analysis), FCS (functional class scoring), and PT (pathway topology) approaches are three generations of GSE methods along the timeline of development. Previous versions of KOBAS provided services based on just the ORA method. Here we presented version 3.0 of KOBAS, which is named KOBAS-i (short for KOBAS intelligent version). It introduced a novel machine learning-based method we published earlier, CGPS, which incorporates seven FCS tools and two PT tools into a single ensemble score and intelligently prioritizes the relevant biological pathways. In addition, KOBAS has expanded the downstream exploratory visualization for selecting and understanding the enriched results. The tool constructs a novel view of cirFunMap, which presents different enriched terms and their correlations in a landscape. Finally, based on the previous version's framework, KOBAS increased the number of supported species from 1327 to 5944. For an easier local run, it also provides a prebuilt Docker image that requires no installation, as a supplementary to the source code version. KOBAS can be freely accessed at http://kobas.cbi.pku.edu.cn, and a mirror site is available at http://bioinfo.org/kobas.
NONCODE (http://www.bioinfo.org/noncode/) is an interactive database that aims to present the most complete collection and annotation of non-coding RNAs, especially long non-coding RNAs (lncRNAs). The recently reduced cost of RNA sequencing has produced an explosion of newly identified data. Revolutionary third-generation sequencing methods have also contributed to more accurate annotations. Accumulative experimental data also provides more comprehensive knowledge of lncRNA functions. In this update, NONCODE has added six new species, bringing the total to 16 species altogether. The lncRNAs in NONCODE have increased from 210 831 to 527,336. For human and mouse, the lncRNA numbers are 167,150 and 130,558, respectively. NONCODE 2016 has also introduced three important new features: (i) conservation annotation; (ii) the relationships between lncRNAs and diseases; and (iii) an interface to choose high-quality datasets through predicted scores, literature support and long-read sequencing method support. NONCODE is also accessible through http://www.noncode.org/.
Pharmacotranscriptomics has become a powerful approach for evaluating the therapeutic efficacy of drugs and discovering new drug targets. Recently, studies of traditional Chinese medicine (TCM) have increasingly turned to high-throughput transcriptomic screens for molecular effects of herbs/ingredients. And numerous studies have examined gene targets for herbs/ingredients, and link herbs/ingredients to various modern diseases. However, there is currently no systematic database organizing these data for TCM. Therefore, we built HERB, a high-throughput experiment- and reference-guided database of TCM, with its Chinese name as BenCaoZuJian. We re-analyzed 6164 gene expression profiles from 1037 high-throughput experiments evaluating TCM herbs/ingredients, and generated connections between TCM herbs/ingredients and 2837 modern drugs by mapping the comprehensive pharmacotranscriptomics dataset in HERB to CMap, the largest such dataset for modern drugs. Moreover, we manually curated 1241 gene targets and 494 modern diseases for 473 herbs/ingredients from 1966 references published recently, and cross-referenced this novel information to databases containing such data for drugs. Together with database mining and statistical inference, we linked 12 933 targets and 28 212 diseases to 7263 herbs and 49 258 ingredients and provided six pairwise relationships among them in HERB. In summary, HERB will intensively support the modernization of TCM and guide rational modern drug discovery efforts. And it is accessible through http://herb.ac.cn/.
Coming soon — researchers in similar fields and career stages