RSeQC: quality control of RNA-seq experimentsMOTIVATION: RNA-seq has been extensively used for transcriptome study. Quality control (QC) is critical to ensure that RNA-seq data are of high quality and suitable for subsequent analyses. However, QC is a time-consuming and complex task, due to the massive size and versatile nature of RNA-seq data. Therefore, a convenient and comprehensive QC tool to assess RNA-seq quality is sorely needed. RESULTS: We developed the RSeQC package to comprehensively evaluate different aspects of RNA-seq experiments, such as sequence quality, GC bias, polymerase chain reaction bias, nucleotide composition bias, sequencing depth, strand specificity, coverage uniformity and read distribution over the genome structure. RSeQC takes both SAM and BAM files as input, which can be produced by most RNA-seq mapping tools as well as BED files, which are widely used for gene models. Most modules in RSeQC take advantage of R scripts for visualization, and they are notably efficient in dealing with large BAM/SAM files containing hundreds of millions of alignments. AVAILABILITY AND IMPLEMENTATION: RSeQC is written in Python and C. Source code and a comprehensive user's manual are freely available at: http://code.google.com/p/rseqc/.
CPAT: Coding-Potential Assessment Tool using an alignment-free logistic regression modelLiguo Wang, Hyun Jung Park, Surendra Dasari et al.|Nucleic Acids Research|2013 Thousands of novel transcripts have been identified using deep transcriptome sequencing. This discovery of large and 'hidden' transcriptome rejuvenates the demand for methods that can rapidly distinguish between coding and noncoding RNA. Here, we present a novel alignment-free method, Coding Potential Assessment Tool (CPAT), which rapidly recognizes coding and noncoding transcripts from a large pool of candidates. To this end, CPAT uses a logistic regression model built with four sequence features: open reading frame size, open reading frame coverage, Fickett TESTCODE statistic and hexamer usage bias. CPAT software outperformed (sensitivity: 0.96, specificity: 0.97) other state-of-the-art alignment-based software such as Coding-Potential Calculator (sensitivity: 0.99, specificity: 0.74) and Phylo Codon Substitution Frequencies (sensitivity: 0.90, specificity: 0.63). In addition to high accuracy, CPAT is approximately four orders of magnitude faster than Coding-Potential Calculator and Phylo Codon Substitution Frequencies, enabling its users to process thousands of transcripts within seconds. The software accepts input sequences in either FASTA- or BED-formatted data files. We also developed a web interface for CPAT that allows users to submit sequences and receive the prediction results almost instantly.
CrossMap: a versatile tool for coordinate conversion between genome assembliesHao Zhao, Zhifu Sun, Jing Wang et al.|Bioinformatics|2013 MOTIVATION: Reference genome assemblies are subject to change and refinement from time to time. Generally, researchers need to convert the results that have been analyzed according to old assemblies to newer versions, or vice versa, to facilitate meta-analysis, direct comparison, data integration and visualization. Several useful conversion tools can convert genome interval files in browser extensible data or general feature format, but none have the functionality to convert files in sequence alignment map or BigWig format. This is a significant gap in computational genomics tools, as these formats are the ones most widely used for representing high-throughput sequencing data, such as RNA-seq, chromatin immunoprecipitation sequencing, DNA-seq, etc. RESULTS: Here we developed CrossMap, a versatile and efficient tool for converting genome coordinates between assemblies. CrossMap supports most of the commonly used file formats, including BAM, sequence alignment map, Wiggle, BigWig, browser extensible data, general feature format, gene transfer format and variant call format. AVAILABILITY AND IMPLEMENTATION: CrossMap is written in Python and C. Source code and a comprehensive user's manual are freely available at: http://crossmap.sourceforge.net/.
Integrated watershed management: evolution, development and emerging trendsGuangyu Wang, Shari L. Mang, Haisheng Cai et al.|Journal of Forestry Research|2016 Watershed management is an ever-evolving practice involving the management of land, water, biota, and other resources in a defined area for ecological, social, and economic purposes. In this paper, we explore the following questions: How has watershed management evolved? What new tools are available and how can they be integrated into sustainable watershed management? To address these questions, we discuss the process of developing integrated watershed management strategies for sustainable management through the incorporation of adaptive management techniques and traditional ecological knowledge. We address the numerous benefits from integration across disciplines and jurisdictional boundaries, as well as the incorporation of technological advancements, such as remote sensing, GIS, big data, and multi-level social-ecological systems analysis, into watershed management strategies. We use three case studies from China, Europe, and Canada to review the success and failure of integrated watershed management in addressing different ecological, social, and economic dilemmas in geographically diverse locations. Although progress has been made in watershed management strategies, there are still numerous issues impeding successful management outcomes; many of which can be remedied through holistic management approaches, incorporation of cutting-edge science and technology, and cross-jurisdictional coordination. We conclude by highlighting that future watershed management will need to account for climate change impacts by employing technological advancements and holistic, cross-disciplinary approaches to ensure watersheds continue to serve their ecological, social, and economic functions. We present three case studies in this paper as a valuable resource for scientists, resource managers, government agencies, and other stakeholders aiming to improve integrated watershed management strategies and more efficiently and successfully achieve ecological and socio-economic management objectives.
Intrinsic BET inhibitor resistance in SPOP-mutated prostate cancer is mediated by BET protein stabilization and AKT–mTORC1 activationPingzhao Zhang, Dejie Wang, Yu Zhao et al.|Nature Medicine|2017