InterPro: the protein sequence classification resource in 2025

Matthias Blum(European Bioinformatics Institute), Antonina Andreeva(European Bioinformatics Institute), Laise Cavalcanti Florentino(European Bioinformatics Institute), Sara Chuguransky(European Bioinformatics Institute), Tiago Grego(European Bioinformatics Institute), Emma Hobbs(European Bioinformatics Institute), Beatriz Lázaro(European Bioinformatics Institute), Ailsa Orr(European Bioinformatics Institute), Typhaine Paysan‐Lafosse(European Bioinformatics Institute), Irina Ponamareva(European Bioinformatics Institute), Gustavo A Salazar(European Bioinformatics Institute), Nicola Bordin(Institute of Structural and Molecular Biology), Peer Bork(European Molecular Biology Laboratory), Alan Bridge(SIB Swiss Institute of Bioinformatics), Lucy J. Colwell(Google (United States)), Julian Gough(MRC Laboratory of Molecular Biology), Daniel H. Haft(National Institutes of Health), Ivica Letunić(Biobyte Solutions (Germany)), Felipe Llinares-López(Google (United States)), Aron Marchler‐Bauer(National Institutes of Health), Laetitia Meng-Papaxanthos(Google (United States)), Huaiyu Mi(University of Southern California), Darren A. Natale(Georgetown University), Christine Orengo(Institute of Structural and Molecular Biology), Arun Prasad Pandurangan(MRC Laboratory of Molecular Biology), Damiano Piovesan(University of Padua), Catherine Rivoire(SIB Swiss Institute of Bioinformatics), Christian J A Sigrist(SIB Swiss Institute of Bioinformatics), Narmada Thanki(National Institutes of Health), Françoise Thibaud‐Nissen(National Institutes of Health), Paul D. Thomas(University of Southern California), Silvio C. E. Tosatto(University of Padua), Cathy Wu(Georgetown University), Alex Bateman(European Bioinformatics Institute)
Nucleic Acids Research
November 20, 2024
Cited by 832Open Access
Full Text

Abstract

InterPro (https://www.ebi.ac.uk/interpro) is a freely accessible resource for the classification of protein sequences into families. It integrates predictive models, known as signatures, from multiple member databases to classify sequences into families and predict the presence of domains and significant sites. The InterPro database provides annotations for over 200 million sequences, ensuring extensive coverage of UniProtKB, the standard repository of protein sequences, and includes mappings to several other major resources, such as Gene Ontology (GO), Protein Data Bank in Europe (PDBe) and the AlphaFold Protein Structure Database. In this publication, we report on the status of InterPro (version 101.0), detailing new developments in the database, associated web interface and software. Notable updates include the increased integration of structures predicted by AlphaFold and the enhanced description of protein families using artificial intelligence. Over the past two years, more than 5000 new InterPro entries have been created. The InterPro website now offers access to 85 000 protein families and domains from its member databases and serves as a long-term archive for retired databases. InterPro data, software and tools are freely available.


Related Papers