Y

Yang Shen

Tiangong University

ORCID: 0000-0002-1703-7796

Publishes on Protein Structure and Dynamics, Computational Drug Discovery Methods, RNA and protein synthesis mechanisms. 227 papers and 10.6k citations.

227Publications
10.6kTotal Citations

Is this you? Claim your profile.

Add your photo, update your bio, and get notified when your ranking changes.

Top publicationsby citations

Genomic and Molecular Landscape of DNA Damage Repair Deficiency across The Cancer Genome Atlas
Cited by 1.2kOpen Access

DNA damage repair (DDR) pathways modulate cancer risk, progression, and therapeutic response. We systematically analyzed somatic alterations to provide a comprehensive view of DDR deficiency across 33 cancer types. Mutations with accompanying loss of heterozygosity were observed in over 1/3 of DDR genes, including TP53 and BRCA1/2. Other prevalent alterations included epigenetic silencing of the direct repair genes EXO5, MGMT, and ALKBH3 in ∼20% of samples. Homologous recombination deficiency (HRD) was present at varying frequency in many cancer types, most notably ovarian cancer. However, in contrast to ovarian cancer, HRD was associated with worse outcomes in several other cancers. Protein structure-based analyses allowed us to predict functional consequences of rare, recurrent DDR mutations. A new machine-learning-based classifier developed from gene expression data allowed us to identify alterations that phenocopy deleterious TP53 mutations. These frequent DDR gene alterations in many human cancers have functional consequences that may determine cancer progression and guide therapy.

Graph Contrastive Learning with Augmentations
Yuning You, Tianlong Chen, Yongduo Sui et al.|arXiv (Cornell University)|2020
Cited by 864Open Access

Generalizable, transferrable, and robust representation learning on graph-structured data remains a challenge for current graph neural networks (GNNs). Unlike what has been developed for convolutional neural networks (CNNs) for image data, self-supervised learning and pre-training are less explored for GNNs. In this paper, we propose a graph contrastive learning (GraphCL) framework for learning unsupervised representations of graph data. We first design four types of graph augmentations to incorporate various priors. We then systematically study the impact of various combinations of graph augmentations on multiple datasets, in four different settings: semi-supervised, unsupervised, and transfer learning as well as adversarial attacks. The results show that, even without tuning augmentation extents nor using sophisticated GNN architectures, our GraphCL framework can produce graph representations of similar or better generalizability, transferrability, and robustness compared to state-of-the-art methods. We also investigate the impact of parameterized graph augmentation extents and patterns, and observe further performance gains in preliminary experiments. Our codes are available at https://github.com/Shen-Lab/GraphCL.

DeepAffinity: interpretable deep learning of compound–protein affinity through unified recurrent and convolutional neural networks
Mostafa Karimi, Di Wu, Zhangyang Wang et al.|Bioinformatics|2019
Cited by 524Open Access

Abstract Motivation Drug discovery demands rapid quantification of compound–protein interaction (CPI). However, there is a lack of methods that can predict compound–protein affinity from sequences alone with high applicability, accuracy and interpretability. Results We present a seamless integration of domain knowledges and learning-based approaches. Under novel representations of structurally annotated protein sequences, a semi-supervised deep learning model that unifies recurrent and convolutional neural networks has been proposed to exploit both unlabeled and labeled data, for jointly encoding molecular representations and predicting affinities. Our representations and models outperform conventional options in achieving relative error in IC50 within 5-fold for test cases and 20-fold for protein classes not included for training. Performances for new protein classes with few labeled data are further improved by transfer learning. Furthermore, separate and joint attention mechanisms are developed and embedded to our model to add to its interpretability, as illustrated in case studies for predicting and explaining selective drug–target interactions. Lastly, alternative representations using protein sequences or compound graphs and a unified RNN/GCNN-CNN model using graph CNN (GCNN) are also explored to reveal algorithmic challenges ahead. Availability and implementation Data and source codes are available at https://github.com/Shen-Lab/DeepAffinity. Supplementary information Supplementary data are available at Bioinformatics online.