DEGseq: an R package for identifying differentially expressed genes from RNA-seq dataLikun Wang, Zhixing Feng, Xi Wang et al.|Bioinformatics|2009 Abstract Summary: High-throughput RNA sequencing (RNA-seq) is rapidly emerging as a major quantitative transcriptome profiling platform. Here, we present DEGseq, an R package to identify differentially expressed genes or isoforms for RNA-seq data from different samples. In this package, we integrated three existing methods, and introduced two novel methods based on MA-plot to detect and visualize gene expression difference. Availability: The R package and a quick-start vignette is available at http://bioinfo.au.tsinghua.edu.cn/software/degseq Contact: xwwang@tsinghua.edu.cn; zhangxg@tsinghua.edu.cn Supplementary information: Supplementary data are available at Bioinformatics online.
Genome-wide mapping of methylated adenine residues in pathogenic Escherichia coli using single-molecule real-time sequencingEpigenetic Switch Driven by DNA Inversions Dictates Phase Variation in Streptococcus pneumoniaeJing Li, Jing Li, Zhixing Feng et al.|PLoS Pathogens|2016 DNA methylation is an important epigenetic mechanism for phenotypic diversification in all forms of life. We previously described remarkable cell-to-cell heterogeneity in epigenetic pattern within a clonal population of Streptococcus pneumoniae, a leading human pathogen. We here report that the epigenetic diversity is caused by extensive DNA inversions among hsdSA, hsdSB, and hsdSC, three methyltransferase hsdS genes in the Spn556II type-I restriction modification (R-M) locus. Because hsdSA encodes the sequence recognition subunit of this type-I R-M DNA methyltransferase, these site-specific recombinations generate pneumococcal cells with variable HsdSA alleles and thereby diverse genome methylation patterns. Most importantly, the DNA methylation pattern specified by the HsdSA1 allele leads to the formation of opaque colonies, whereas the pneumococci lacking HsdSA1 produce transparent colonies. Furthermore, this HsdSA1-dependent phase variation requires intact DNA methylase activity encoded by hsdM in the Spn556II (renamed colony opacity determinant or cod) locus. Thus, the DNA inversion-driven ON/OFF switch of the hsdSA1 allele in the cod locus and resulting epigenetic switch dictate the phase variation between the opaque and transparent phenotypes. Phase variation has been well documented for its importance in pneumococcal carriage and invasive infection, but its molecular basis remains unclear. Our work has discovered a novel epigenetic cause for this significant pathobiology phenomenon in S. pneumoniae. Lastly, our findings broadly represents a significant advancement in our understanding of bacterial R-M systems and their potential in shaping epigenetic and phenotypic diversity of the prokaryotic organisms because similar site-specific recombination systems widely exist in many archaeal and bacterial species.
Modeling kinetic rate variation in third generation DNA sequencing data to detect putative modifications to DNA basesCurrent generation DNA sequencing instruments are moving closer to seamlessly sequencing genomes of entire populations as a routine part of scientific investigation. However, while significant inroads have been made identifying small nucleotide variation and structural variations in DNA that impact phenotypes of interest, progress has not been as dramatic regarding epigenetic changes and base-level damage to DNA, largely due to technological limitations in assaying all known and unknown types of modifications at genome scale. Recently, single-molecule real time (SMRT) sequencing has been reported to identify kinetic variation (KV) events that have been demonstrated to reflect epigenetic changes of every known type, providing a path forward for detecting base modifications as a routine part of sequencing. However, to date no statistical framework has been proposed to enhance the power to detect these events while also controlling for false-positive events. By modeling enzyme kinetics in the neighborhood of an arbitrary location in a genomic region of interest as a conditional random field, we provide a statistical framework for incorporating kinetic information at a test position of interest as well as at neighboring sites that help enhance the power to detect KV events. The performance of this and related models is explored, with the best-performing model applied to plasmid DNA isolated from Escherichia coli and mitochondrial DNA isolated from human brain tissue. We highlight widespread kinetic variation events, some of which strongly associate with known modification events, while others represent putative chemically modified sites of unknown types.
Detecting DNA Modifications from SMRT Sequencing Data by Modeling Sequence Context Dependence of Polymerase KineticZhixing Feng, Gang Fang, Jonas Korlach et al.|PLoS Computational Biology|2013 DNA modifications such as methylation and DNA damage can play critical regulatory roles in biological systems. Single molecule, real time (SMRT) sequencing technology generates DNA sequences as well as DNA polymerase kinetic information that can be used for the direct detection of DNA modifications. We demonstrate that local sequence context has a strong impact on DNA polymerase kinetics in the neighborhood of the incorporation site during the DNA synthesis reaction, allowing for the possibility of estimating the expected kinetic rate of the enzyme at the incorporation site using kinetic rate information collected from existing SMRT sequencing data (historical data) covering the same local sequence contexts of interest. We develop an Empirical Bayesian hierarchical model for incorporating historical data. Our results show that the model could greatly increase DNA modification detection accuracy, and reduce requirement of control data coverage. For some DNA modifications that have a strong signal, a control sample is not even needed by using historical data as alternative to control. Thus, sequencing costs can be greatly reduced by using the model. We implemented the model in a R package named seqPatch, which is available at https://github.com/zhixingfeng/seqPatch.