J

Jingwen Bai

European Bioinformatics Institute

Publishes on Advanced Proteomics Techniques and Applications, Research Data Management Practices, Scientific Computing and Data Management. 13 papers and 15.8k citations.

13Publications
15.8kTotal Citations

Is this you? Claim your profile.

Add your photo, update your bio, and get notified when your ranking changes.

Top publicationsby citations

The PRIDE database and related tools and resources in 2019: improving support for quantification data
Yasset Pérez‐Riverol, Attila Csordás, Jingwen Bai et al.|Nucleic Acids Research|2018
Cited by 7.4kOpen Access

The PRoteomics IDEntifications (PRIDE) database (https://www.ebi.ac.uk/pride/) is the world's largest data repository of mass spectrometry-based proteomics data, and is one of the founding members of the global ProteomeXchange (PX) consortium. In this manuscript, we summarize the developments in PRIDE resources and related tools since the previous update manuscript was published in Nucleic Acids Research in 2016. In the last 3 years, public data sharing through PRIDE (as part of PX) has definitely become the norm in the field. In parallel, data re-use of public proteomics data has increased enormously, with multiple applications. We first describe the new architecture of PRIDE Archive, the archival component of PRIDE. PRIDE Archive and the related data submission framework have been further developed to support the increase in submitted data volumes and additional data types. A new scalable and fault tolerant storage backend, Application Programming Interface and web interface have been implemented, as a part of an ongoing process. Additionally, we emphasize the improved support for quantitative proteomics data through the mzTab format. At last, we outline key statistics on the current data contents and volume of downloads, and how PRIDE data are starting to be disseminated to added-value resources including Ensembl, UniProt and Expression Atlas.

The PRIDE database resources in 2022: a hub for mass spectrometry-based proteomics evidences
Yasset Pérez‐Riverol, Jingwen Bai, Chakradhar Bandla et al.|Nucleic Acids Research|2021
Cited by 6.7kOpen Access

The PRoteomics IDEntifications (PRIDE) database (https://www.ebi.ac.uk/pride/) is the world's largest data repository of mass spectrometry-based proteomics data. PRIDE is one of the founding members of the global ProteomeXchange (PX) consortium and an ELIXIR core data resource. In this manuscript, we summarize the developments in PRIDE resources and related tools since the previous update manuscript was published in Nucleic Acids Research in 2019. The number of submitted datasets to PRIDE Archive (the archival component of PRIDE) has reached on average around 500 datasets per month during 2021. In addition to continuous improvements in PRIDE Archive data pipelines and infrastructure, the PRIDE Spectra Archive has been developed to provide direct access to the submitted mass spectra using Universal Spectrum Identifiers. As a key point, the file format MAGE-TAB for proteomics has been developed to enable the improvement of sample metadata annotation. Additionally, the resource PRIDE Peptidome provides access to aggregated peptide/protein evidences across PRIDE Archive. Furthermore, we will describe how PRIDE has increased its efforts to reuse and disseminate high-quality proteomics data into other added-value resources such as UniProt, Ensembl and Expression Atlas.

The PRIDE database at 20 years: 2025 update
Yasset Pérez‐Riverol, Chakradhar Bandla, Deepti J Kundu et al.|Nucleic Acids Research|2024
Cited by 1.6kOpen Access

The PRoteomics IDEntifications (PRIDE) database (https://www.ebi.ac.uk/pride/) is the world's leading mass spectrometry (MS)-based proteomics data repository and one of the founding members of the ProteomeXchange consortium. This manuscript summarizes the developments in PRIDE resources and related tools for the last three years. The number of submitted datasets to PRIDE Archive (the archival component of PRIDE) has reached on average around 534 datasets per month. This has been possible thanks to continuous improvements in infrastructure such as a new file transfer protocol for very large datasets (Globus), a new data resubmission pipeline and an automatic dataset validation process. Additionally, we will highlight novel activities such as the availability of the PRIDE chatbot (based on the use of open-source Large Language Models), and our work to improve support for MS crosslinking datasets. Furthermore, we will describe how we have increased our efforts to reuse, reanalyze and disseminate high-quality proteomics data into added-value resources such as UniProt, Ensembl and Expression Atlas.

Evolution Analysis of the Aux/IAA Gene Family in Plants Shows Dual Origins and Variable Nuclear Localization Signals
Wentao Wu, Yaxue Liu, Yuqian Wang et al.|International Journal of Molecular Sciences|2017
Cited by 102Open Access

The plant hormone auxin plays pivotal roles in many aspects of plant growth and development. The auxin/indole-3-acetic acid (Aux/IAA) gene family encodes short-lived nuclear proteins acting on auxin perception and signaling, but the evolutionary history of this gene family remains to be elucidated. In this study, the Aux/IAA gene family in 17 plant species covering all major lineages of plants is identified and analyzed by using multiple bioinformatics methods. A total of 434 Aux/IAA genes was found among these plant species, and the gene copy number ranges from three (Physcomitrella patens) to 63 (Glycine max). The phylogenetic analysis shows that the canonical Aux/IAA proteins can be generally divided into five major clades, and the origin of Aux/IAA proteins could be traced back to the common ancestor of land plants and green algae. Many truncated Aux/IAA proteins were found, and some of these truncated Aux/IAA proteins may be generated from the C-terminal truncation of auxin response factor (ARF) proteins. Our results indicate that tandem and segmental duplications play dominant roles for the expansion of the Aux/IAA gene family mainly under purifying selection. The putative nuclear localization signals (NLSs) in Aux/IAA proteins are conservative, and two kinds of new primordial bipartite NLSs in P. patens and Selaginella moellendorffii were discovered. Our findings not only give insights into the origin and expansion of the Aux/IAA gene family, but also provide a basis for understanding their functions during the course of evolution.

Automated atrial fibrillation detection based on deep learning network
Chan Yuan, Yan Yan, Lin Zhou et al.|Unknown|2016
Cited by 34

Aiming at the shorting of the existing atrial fibrillation (AF) detection algorithms and improve the ability of intelligent recognition and extraction of AF signals. Recently, deep learning theory with massive data has been used on image, voice and other filed widely. In this paper, a method based on the stack sparse autoencoder neural network, a instance of deep learning strategy, was proposed for AF detection. Greedy layer-wise training algorithms and massive unlabeled hotter data from a hospital were used to train the deep learning system, and Back Propagation algorithm and half of the MIT-BIH standard databases were applied to optimized the whole system. Another half of the standard data were used to evaluated the performance of this method. The autoencoder learns the high level features which can describe the necessary information better from the raw data The experimental results show that the accuracy of the algorithm based on stack sparse autoencoder is 98.309%, so this approach is of great significance on the real-time monitoring of atrial fibrillation signal in electrocardiogram.