UniProt: the Universal Protein Knowledgebase in 2023Alex Bateman, María Martin, Sandra Orchard et al.|Nucleic Acids Research|2022 The aim of the UniProt Knowledgebase is to provide users with a comprehensive, high-quality and freely accessible set of protein sequences annotated with functional information. In this publication we describe enhancements made to our data processing pipeline and to our website to adapt to an ever-increasing information content. The number of sequences in UniProtKB has risen to over 227 million and we are working towards including a reference proteome for each taxonomic group. We continue to extract detailed annotations from the literature to update or create reviewed entries, while unreviewed entries are supplemented with annotations provided by automated systems using a variety of machine-learning techniques. In addition, the scientific community continues their contributions of publications and annotations to UniProt entries of their interest. Finally, we describe our new website (https://www.uniprot.org/), designed to enhance our users' experience and make our data easily accessible to the research community. This interface includes access to AlphaFold structures for more than 85% of all entries as well as improved visualisations for subcellular localisation of proteins.
UniProt: the Universal Protein Knowledgebase in 2025Alex Bateman, María Martin, Sandra Orchard et al.|Nucleic Acids Research|2024 The aim of the UniProt Knowledgebase (UniProtKB; https://www.uniprot.org/) is to provide users with a comprehensive, high-quality and freely accessible set of protein sequences annotated with functional information. In this publication, we describe ongoing changes to our production pipeline to limit the sequences available in UniProtKB to high-quality, non-redundant reference proteomes. We continue to manually curate the scientific literature to add the latest functional data and use machine learning techniques. We also encourage community curation to ensure key publications are not missed. We provide an update on the automatic annotation methods used by UniProtKB to predict information for unreviewed entries describing unstudied proteins. Finally, updates to the UniProt website are described, including a new tab linking protein to genomic information. In recognition of its value to the scientific community, the UniProt database has been awarded Global Core Biodata Resource status.
Annotation of biologically relevant ligands in UniProtKB using ChEBIMOTIVATION: To provide high quality, computationally tractable annotation of binding sites for biologically relevant (cognate) ligands in UniProtKB using the chemical ontology ChEBI (Chemical Entities of Biological Interest), to better support efforts to study and predict functionally relevant interactions between protein sequences and structures and small molecule ligands. RESULTS: We structured the data model for cognate ligand binding site annotations in UniProtKB and performed a complete reannotation of all cognate ligand binding sites using stable unique identifiers from ChEBI, which we now use as the reference vocabulary for all such annotations. We developed improved search and query facilities for cognate ligands in the UniProt website, REST API and SPARQL endpoint that leverage the chemical structure data, nomenclature and classification that ChEBI provides. AVAILABILITY AND IMPLEMENTATION: Binding site annotations for cognate ligands described using ChEBI are available for UniProtKB protein sequence records in several formats (text, XML and RDF) and are freely available to query and download through the UniProt website (www.uniprot.org), REST API (www.uniprot.org/help/api), SPARQL endpoint (sparql.uniprot.org/) and FTP site (https://ftp.uniprot.org/pub/databases/uniprot/). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
The UniProt website API: facilitating programmatic access to protein knowledgeThe UniProt REST API is a freely available, open-access resource that powers the UniProt.org website and gives users flexible programmatic interaction with protein knowledge data. It provides access to UniProtKB, UniRef, UniParc, Proteomes, GeneCentric, ARBA, UniRule, and the ID Mapping tool, along with supporting data and controlled vocabularies. Users can access the API with their favorite programming language and generate example code snippets to access the UniProt databases using the API documentation page (https://www.uniprot.org/api-documentation) in various languages. API results can be returned and downloaded in various formats. With an average of 303 million requests per month over the last year, the API enables structured search queries using logical operators and parentheses, allows users to specify fields of interest within results, and customize downloads for direct integration into workflows. The API is a powerful tool that empowers users to fully utilize UniProt data across multiple datasets, enabling download, analysis, and extraction of valuable research insights. This website is free and open to all users, and there is no login requirement.