GPS-pPLM: A Language Model for Prediction of Prokaryotic Phosphorylation Sites

C. Zhang(Huazhong University of Science and Technology), Dachao Tang(Huazhong University of Science and Technology), Cheng Han(Huazhong University of Science and Technology), Yujie Gou(Huazhong University of Science and Technology), Miaomiao Chen(Huazhong University of Science and Technology), Xinhe Huang(Huazhong University of Science and Technology), Dan Liu(Huazhong University of Science and Technology), Miaoying Zhao(Huazhong University of Science and Technology), Leming Xiao(Huazhong University of Science and Technology), Qiang Xiao(Huazhong University of Science and Technology), Di Peng(Huazhong University of Science and Technology), Yu Xue(Huazhong University of Science and Technology)
Cells
November 8, 2024
Cited by 3Open Access
Full Text

Abstract

In the prokaryotic kingdom, protein phosphorylation serves as one of the most important posttranslational modifications (PTMs) and is involved in orchestrating a broad spectrum of biological processes. Here, we report an updated online server named the group-based prediction system for prokaryotic phosphorylation language model (GPS-pPLM), used for predicting phosphorylation sites (p-sites) in prokaryotes. For model training, two deep learning methods, a transformer and a deep neural network, were employed, and a total of 10 sequence features and contextual features were integrated. Using 44,839 nonredundant p-sites in 16,041 proteins from 95 prokaryotes, two general models for the prediction of O-phosphorylation and N-phosphorylation were first pretrained and then fine-tuned to construct 6 predictors specific for each phosphorylatable residue type as well as 134 species-specific predictors. Compared with other existing tools, the GPS-pPLM exhibits higher accuracy in predicting prokaryotic O-phosphorylation p-sites. Protein sequences in FASTA format or UniProt accession numbers can be submitted by users, and the predicted results are displayed in tabular form. In addition, we annotate the predicted p-sites with knowledge from 22 public resources, including experimental evidence, 3D structures, and disorder tendencies. The online service of the GPS-pPLM is freely accessible for academic research.


Related Papers

No related papers found

Powered by citation graph analysis