Shian Su

Opportunities and challenges in long-read sequencing data analysis

Shanika L. Amarasinghe, Shian Su, Xueyi Dong et al.|Genome biology|2020

Cited by 2.6kOpen Access

Long-read technologies are overcoming early limitations in accuracy and throughput, broadening their application domains in genomics. Dedicated analysis tools that take into account the characteristics of long-read data are thus required, but the fast pace of development of such tools can be overwhelming. To assist in the design and analysis of long-read sequencing projects, we review the current landscape of available tools and present an online interactive database, long-read-tools.org, to facilitate their browsing. We further focus on the principles of error correction, base modification detection, and long-read transcriptomics analysis and highlight the challenges that remain.

RNA-seq analysis is easy as 1-2-3 with limma, Glimma and edgeR

Charity W. Law, Monther Alhamdoosh, Shian Su et al.|F1000Research|2018

Cited by 653Open Access

<ns3:p>The ability to easily and efficiently analyse RNA-sequencing data is a key strength of the Bioconductor project. Starting with counts summarised at the gene-level, a typical analysis involves pre-processing, exploratory data analysis, differential expression testing and pathway analysis with the results obtained informing future experiments and validation studies. In this workflow article, we analyse RNA-sequencing data from the mouse mammary gland, demonstrating use of the popular <ns3:bold>edgeR</ns3:bold> package to import, organise, filter and normalise the data, followed by the <ns3:bold>limma</ns3:bold> package with its <ns3:italic>voom</ns3:italic> method, linear modelling and empirical Bayes moderation to assess differential expression and perform gene set testing. This pipeline is further enhanced by the <ns3:bold>Glimma</ns3:bold> package which enables interactive exploration of the results so that individual samples and genes can be examined by the user. The complete analysis offered by these three packages highlights the ease with which researchers can turn the raw counts from an RNA-sequencing experiment into biological insights using Bioconductor.</ns3:p>

Why weight? Modelling sample and observational level variability improves power in RNA-seq analyses

Ruijie Liu, Aliaksei Z. Holik, Shian Su et al.|Nucleic Acids Research|2015

Cited by 635Open Access

Variations in sample quality are frequently encountered in small RNA-sequencing experiments, and pose a major challenge in a differential expression analysis. Removal of high variation samples reduces noise, but at a cost of reducing power, thus limiting our ability to detect biologically meaningful changes. Similarly, retaining these samples in the analysis may not reveal any statistically significant changes due to the higher noise level. A compromise is to use all available data, but to down-weight the observations from more variable samples. We describe a statistical approach that facilitates this by modelling heterogeneity at both the sample and observational levels as part of the differential expression analysis. At the sample level this is achieved by fitting a log-linear variance model that includes common sample-specific or group-specific parameters that are shared between genes. The estimated sample variance factors are then converted to weights and combined with observational level weights obtained from the mean-variance relationship of the log-counts-per-million using 'voom'. A comprehensive analysis involving both simulations and experimental RNA-sequencing data demonstrates that this strategy leads to a universally more powerful analysis and fewer false discoveries when compared to conventional approaches. This methodology has wide application and is implemented in the open-source 'limma' package.

RNA-seq analysis is easy as 1-2-3 with limma, Glimma and edgeR

Charity W. Law, Monther Alhamdoosh, Shian Su et al.|F1000Research|2016

Cited by 574Open Access

<ns3:p>The ability to easily and efficiently analyse RNA-sequencing data is a key strength of the Bioconductor project. Starting with counts summarised at the gene-level, a typical analysis involves pre-processing, exploratory data analysis, differential expression testing and pathway analysis with the results obtained informing future experiments and validation studies. In this workflow article, we analyse RNA-sequencing data from the mouse mammary gland, demonstrating use of the popular edgeR package to import, organise, filter and normalise the data, followed by the limma package with its voom method, linear modelling and empirical Bayes moderation to assess differential expression and perform gene set testing. This pipeline is further enhanced by the Glimma package which enables interactive exploration of the results so that individual samples and genes can be examined by the user. The complete analysis offered by these three packages highlights the ease with which researchers can turn the raw counts from an RNA-sequencing experiment into biological insights using Bioconductor.</ns3:p>

Benchmarking single cell RNA-sequencing analysis pipelines using mixture control experiments

Luyi Tian, Xueyi Dong, Saskia Freytag et al.|Nature Methods|2019

Cited by 406

Is this you? Claim your profile.

Top publicationsby citations