E

Ei-Wen Yang

Attagene

Publishes on Cancer Genomics and Diagnostics, RNA Research and Splicing, Radiomics and Machine Learning in Medical Imaging. 43 papers and 776 citations.

43Publications
776Total Citations

Is this you? Claim your profile.

Add your photo, update your bio, and get notified when your ranking changes.

Top publicationsby citations

RNA editing in nascent RNA affects pre-mRNA splicing
Yun-Hua Esther Hsiao, Jae Hoon Bahn, Yun Yang et al.|Genome Research|2018
Cited by 183Open Access

In eukaryotes, nascent RNA transcripts undergo an intricate series of RNA processing steps to achieve mRNA maturation. RNA editing and alternative splicing are two major RNA processing steps that can introduce significant modifications to the final gene products. By tackling these processes in isolation, recent studies have enabled substantial progress in understanding their global RNA targets and regulatory pathways. However, the interplay between individual steps of RNA processing, an essential aspect of gene regulation, remains poorly understood. By sequencing the RNA of different subcellular fractions, we examined the timing of adenosine-to-inosine (A-to-I) RNA editing and its impact on alternative splicing. We observed that >95% A-to-I RNA editing events occurred in the chromatin-associated RNA prior to polyadenylation. We report about 500 editing sites in the 3' acceptor sequences that can alter splicing of the associated exons. These exons are highly conserved during evolution and reside in genes with important cellular function. Furthermore, we identified a second class of exons whose splicing is likely modulated by RNA secondary structures that are recognized by the RNA editing machinery. The genome-wide analyses, supported by experimental validations, revealed remarkable interplay between RNA editing and splicing and expanded the repertoire of functional RNA editing sites.

Regulation of RNA editing by RNA-binding proteins in human cells
Giovanni Quinones-Valdez, Stephen Tran, Hyun-Ik Jun et al.|Communications Biology|2019
Cited by 154Open Access

Abstract Adenosine-to-inosine (A-to-I) editing, mediated by the ADAR enzymes, diversifies the transcriptome by altering RNA sequences. Recent studies reported global changes in RNA editing in disease and development. Such widespread editing variations necessitate an improved understanding of the regulatory mechanisms of RNA editing. Here, we study the roles of >200 RNA-binding proteins (RBPs) in mediating RNA editing in two human cell lines. Using RNA-sequencing and global protein-RNA binding data, we identify a number of RBPs as key regulators of A-to-I editing. These RBPs, such as TDP-43, DROSHA, NF45/90 and Ro60, mediate editing through various mechanisms including regulation of ADAR1 expression, interaction with ADAR1, and binding to Alu elements. We highlight that editing regulation by Ro60 is consistent with the global up-regulation of RNA editing in systemic lupus erythematosus. Additionally, most key editing regulators act in a cell type-specific manner. Together, our work provides insights for the regulatory mechanisms of RNA editing.

Allele-specific binding of RNA-binding proteins reveals functional genetic variants in the RNA
Ei-Wen Yang, Jae Hoon Bahn, Esther Yun-Hua Hsiao et al.|Nature Communications|2019
Cited by 56Open Access

Allele-specific protein-RNA binding is an essential aspect that may reveal functional genetic variants (GVs) mediating post-transcriptional regulation. Recently, genome-wide detection of in vivo binding of RNA-binding proteins is greatly facilitated by the enhanced crosslinking and immunoprecipitation (eCLIP) method. We developed a new computational approach, called BEAPR, to identify allele-specific binding (ASB) events in eCLIP-Seq data. BEAPR takes into account crosslinking-induced sequence propensity and variations between replicated experiments. Using simulated and actual data, we show that BEAPR largely outperforms often-used count analysis methods. Importantly, BEAPR overcomes the inherent overdispersion problem of these methods. Complemented by experimental validations, we demonstrate that the application of BEAPR to ENCODE eCLIP-Seq data of 154 proteins helps to predict functional GVs that alter splicing or mRNA abundance. Moreover, many GVs with ASB patterns have known disease relevance. Overall, BEAPR is an effective method that helps to address the outstanding challenge of functional interpretation of GVs.

Rationalization and Design of the Complementarity Determining Region Sequences in an Antibody-Antigen Recognition Interface
Cited by 48Open Access

Protein-protein interactions are critical determinants in biological systems. Engineered proteins binding to specific areas on protein surfaces could lead to therapeutics or diagnostics for treating diseases in humans. But designing epitope-specific protein-protein interactions with computational atomistic interaction free energy remains a difficult challenge. Here we show that, with the antibody-VEGF (vascular endothelial growth factor) interaction as a model system, the experimentally observed amino acid preferences in the antibody-antigen interface can be rationalized with 3-dimensional distributions of interacting atoms derived from the database of protein structures. Machine learning models established on the rationalization can be generalized to design amino acid preferences in antibody-antigen interfaces, for which the experimental validations are tractable with current high throughput synthetic antibody display technologies. Leave-one-out cross validation on the benchmark system yielded the accuracy, precision, recall (sensitivity) and specificity of the overall binary predictions to be 0.69, 0.45, 0.63, and 0.71 respectively, and the overall Matthews correlation coefficient of the 20 amino acid types in the 24 interface CDR positions was 0.312. The structure-based computational antibody design methodology was further tested with other antibodies binding to VEGF. The results indicate that the methodology could provide alternatives to the current antibody technologies based on animal immune systems in engineering therapeutic and diagnostic antibodies against predetermined antigen epitopes.

Accurate inference of isoforms from multiple sample RNA-Seq data
Masruba Tasnim, Shining Ma, Ei-Wen Yang et al.|BMC Genomics|2015
Cited by 35Open Access

BACKGROUND: RNA-Seq based transcriptome assembly has become a fundamental technique for studying expressed mRNAs (i.e., transcripts or isoforms) in a cell using high-throughput sequencing technologies, and is serving as a basis to analyze the structural and quantitative differences of expressed isoforms between samples. However, the current transcriptome assembly algorithms are not specifically designed to handle large amounts of errors that are inherent in real RNA-Seq datasets, especially those involving multiple samples, making downstream differential analysis applications difficult. On the other hand, multiple sample RNA-Seq datasets may provide more information than single sample datasets that can be utilized to improve the performance of transcriptome assembly and abundance estimation, but such information remains overlooked by the existing assembly tools. RESULTS: We formulate a computational framework of transcriptome assembly that is capable of handling noisy RNA-Seq reads and multiple sample RNA-Seq datasets efficiently. We show that finding an optimal solution under this framework is an NP-hard problem. Instead, we develop an efficient heuristic algorithm, called Iterative Shortest Path (ISP), based on linear programming (LP) and integer linear programming (ILP). Our preliminary experimental results on both simulated and real datasets and comparison with the existing assembly tools demonstrate that (i) the ISP algorithm is able to assemble transcriptomes with a greatly increased precision while keeping the same level of sensitivity, especially when many samples are involved, and (ii) its assembly results help improve downstream differential analysis. The source code of ISP is freely available at http://alumni.cs.ucr.edu/~liw/isp.html.