GPSD: a hybrid learning framework for the prediction of phosphatase-specific dephosphorylation sites

Cheng Han(Huazhong University of Science and Technology), Shanshan Fu(Huazhong University of Science and Technology), Miaomiao Chen(Huazhong University of Science and Technology), Yujie Gou(Huazhong University of Science and Technology), Dan Liu(Huazhong University of Science and Technology), Chi Zhang(Huazhong University of Science and Technology), Xinhe Huang(Huazhong University of Science and Technology), Leming Xiao(Huazhong University of Science and Technology), Miaoying Zhao(Huazhong University of Science and Technology), Jiayi Zhang(Huazhong University of Science and Technology), Qiang Xiao(Huazhong University of Science and Technology), Di Peng(Huazhong University of Science and Technology), Yu Xue(Huazhong University of Science and Technology)
Briefings in Bioinformatics
November 22, 2024
Cited by 1Open Access
Full Text

Abstract

Protein phosphorylation is dynamically and reversibly regulated by protein kinases and protein phosphatases, and plays an essential role in orchestrating a wide range of biological processes. Although a number of tools have been developed for predicting kinase-specific phosphorylation sites (p-sites), computational prediction of phosphatase-specific dephosphorylation sites remains to be a great challenge. In this study, we manually curated 4393 experimentally identified site-specific phosphatase-substrate relationships for 3463 dephosphorylation sites occurring on phosphoserine, phosphothreonine, and/or phosphotyrosine residues, from the literature and public databases. Then, we developed a hybrid learning framework, the group-based prediction system for the prediction of phosphatase-specific dephosphorylation sites (GPSD). For model training, we integrated 10 types of sequence features and utilized three types of machine learning methods, including penalized logistic regression, deep neural networks, and transformer neural networks. First, a pretrained model was constructed using 561 416 nonredundant p-sites and then fine-tuned to generate computational models for predicting general dephosphorylation sites. In addition, 103 individual phosphatase-specific predictors were constructed via transfer learning and meta-learning. For site prediction, one or multiple protein sequences in FASTA format could be inputted, and the prediction results will be shown together with additional annotations, such as protein-protein interactions, structural information, and disorder propensity. The online service of GPSD is freely available at https://gpsd.biocuckoo.cn/. We believe that GPSD can serve as a valuable tool for further analysis of dephosphorylation.


Related Papers

No related papers found

Powered by citation graph analysis