M

Matthieu Barba

European Bioinformatics Institute

ORCID: 0000-0002-7882-8356

Publishes on Genomics and Phylogenetic Studies, Bioinformatics and Genomic Networks, Biochemical and Molecular Research. 22 papers and 3.5k citations.

22Publications
3.5kTotal Citations

Is this you? Claim your profile.

Add your photo, update your bio, and get notified when your ranking changes.

Top publicationsby citations

Ensembl 2024
Cited by 714Open Access

Ensembl (https://www.ensembl.org) is a freely available genomic resource that has produced high-quality annotations, tools, and services for vertebrates and model organisms for more than two decades. In recent years, there has been a dramatic shift in the genomic landscape, with a large increase in the number and phylogenetic breadth of high-quality reference genomes, alongside major advances in the pan-genome representations of higher species. In order to support these efforts and accelerate downstream research, Ensembl continues to focus on scaling for the rapid annotation of new genome assemblies, developing new methods for comparative analysis, and expanding the depth and quality of our genome annotations. This year we have continued our expansion to support global biodiversity research, doubling the number of annotated genomes we support on our Rapid Release site to over 1700, driven by our close collaboration with biodiversity projects such as Darwin Tree of Life. We have also strengthened support for key agricultural species, including the first regulatory builds for farmed animals, and have updated key tools and resources that support the global scientific community, notably the Ensembl Variant Effect Predictor. Ensembl data, software, and tools are freely available.

Ensembl Genomes 2020—enabling non-vertebrate genomic research
Kevin Howe, Bruno Contreras‐Moreira, Nishadi De Silva et al.|Nucleic Acids Research|2019
Cited by 525Open Access

Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species, complementing the resources for vertebrate genomics developed in the context of the Ensembl project (http://www.ensembl.org). Together, the two resources provide a consistent set of interfaces to genomic data across the tree of life, including reference genome sequence, gene models, transcriptional data, genetic variation and comparative analysis. Data may be accessed via our website, online tools platform and programmatic interfaces, with updates made four times per year (in synchrony with Ensembl). Here, we provide an overview of Ensembl Genomes, with a focus on recent developments. These include the continued growth, more robust and reproducible sets of orthologues and paralogues, and enriched views of gene expression and gene function in plants. Finally, we report on our continued deeper integration with the Ensembl project, which forms a key part of our future strategy for dealing with the increasing quantity of available genome-scale data across the tree of life.

VEuPathDB: the eukaryotic pathogen, vector and host bioinformatics resource center
B Kirtley Amos, Cristina Aurrecoechea, Matthieu Barba et al.|Nucleic Acids Research|2021
Cited by 524Open Access

The Eukaryotic Pathogen, Vector and Host Informatics Resource (VEuPathDB, https://veupathdb.org) represents the 2019 merger of VectorBase with the EuPathDB projects. As a Bioinformatics Resource Center funded by the National Institutes of Health, with additional support from the Welllcome Trust, VEuPathDB supports >500 organisms comprising invertebrate vectors, eukaryotic pathogens (protists and fungi) and relevant free-living or non-pathogenic species or hosts. Designed to empower researchers with access to Omics data and bioinformatic analyses, VEuPathDB projects integrate >1700 pre-analysed datasets (and associated metadata) with advanced search capabilities, visualizations, and analysis tools in a graphic interface. Diverse data types are analysed with standardized workflows including an in-house OrthoMCL algorithm for predicting orthology. Comparisons are easily made across datasets, data types and organisms in this unique data mining platform. A new site-wide search facilitates access for both experienced and novice users. Upgraded infrastructure and workflows support numerous updates to the web interface, tools, searches and strategies, and Galaxy workspace where users can privately analyse their own data. Forthcoming upgrades include cloud-ready application architecture, expanded support for the Galaxy workspace, tools for interrogating host-pathogen interactions, and improved interactions with affiliated databases (ClinEpiDB, MicrobiomeDB) and other scientific resources, and increased interoperability with the Bacterial & Viral BRC.

Ensembl 2025
Sarah Dyer, Olanrewaju Austine-Orimoloye, Andrey G Azov et al.|Nucleic Acids Research|2024
Cited by 518Open Access

Ensembl (www.ensembl.org) is an open platform integrating publicly available genomics data across the tree of life with a focus on eukaryotic species related to human health, agriculture and biodiversity. This year has seen a continued expansion in the number of species represented, with >4800 eukaryotic and >31 300 prokaryotic genomes available. The new Ensembl site, currently in beta, has continued to develop, currently holding >2700 eukaryotic genome assemblies. The new site provides genome, gene, transcript, homology and variation views, and will replace the current Rapid Release site; this represents a key step towards provision of a single integrated Ensembl site. Additional activities have included developing improved regulatory annotation for human, mouse and agricultural species, and expanding the Ensembl Variant Effect Predictor tool. To learn more about Ensembl, help and documentation are available along with an extensive training program that can be accessed via our training pages.

Ensembl Genomes 2018: an integrated omics infrastructure for non-vertebrate species
Paul Kersey, James E. Allen, Alexis Allot et al.|Nucleic Acids Research|2017
Cited by 484Open Access

Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species, complementing the resources for vertebrate genomics developed in the Ensembl project (http://www.ensembl.org). Together, the two resources provide a consistent set of programmatic and interactive interfaces to a rich range of data including genome sequence, gene models, transcript sequence, genetic variation, and comparative analysis. This paper provides an update to the previous publications about the resource, with a focus on recent developments and expansions. These include the incorporation of almost 20 000 additional genome sequences and over 35 000 tracks of RNA-Seq data, which have been aligned to genomic sequence and made available for visualization. Other advances since 2015 include the release of the database in Resource Description Framework (RDF) format, a large increase in community-derived curation, a new high-performance protein sequence search, additional cross-references, improved annotation of non-protein-coding genes, and the launch of pre-release and archival sites. Collectively, these changes are part of a continuing response to the increasing quantity of publicly-available genome-scale data, and the consequent need to archive, integrate, annotate and disseminate these using automated, scalable methods.