Publishes on Genomics and Phylogenetic Studies, Insect-Plant Interactions and Control, Insect symbiosis and bacterial influences. 32 papers and 20.3k citations.
MOTIVATION: Genomics has revolutionized biological research, but quality assessment of the resulting assembled sequences is complicated and remains mostly limited to technical measures like N50. RESULTS: We propose a measure for quantitative assessment of genome assembly and annotation completeness based on evolutionarily informed expectations of gene content. We implemented the assessment procedure in open-source software, with sets of Benchmarking Universal Single-Copy Orthologs, named BUSCO. AVAILABILITY AND IMPLEMENTATION: Software implemented in Python and datasets available for download from http://busco.ezlab.org. CONTACT: evgeny.zdobnov@unige.ch SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Genomics promises comprehensive surveying of genomes and metagenomes, but rapidly changing technologies and expanding data volumes make evaluation of completeness a challenging task. Technical sequencing quality metrics can be complemented by quantifying completeness of genomic data sets in terms of the expected gene content of Benchmarking Universal Single-Copy Orthologs (BUSCO, http://busco.ezlab.org). The latest software release implements a complete refactoring of the code to make it more flexible and extendable to facilitate high-throughput assessments. The original six lineage assessment data sets have been updated with improved species sampling, 34 new subsets have been built for vertebrates, arthropods, fungi, and prokaryotes that greatly enhance resolution, and data sets are now also available for nematodes, protists, and plants. Here, we present BUSCO v3 with example analyses that highlight the wide-ranging utility of BUSCO assessments, which extend beyond quality control of genomics data sets to applications in comparative genomics analyses, gene predictor training, metagenomics, and phylogenomics.
OrthoDB (https://www.orthodb.org) provides evolutionary and functional annotations of orthologs. This update features a major scaling up of the resource coverage, sampling the genomic diversity of 1271 eukaryotes, 6013 prokaryotes and 6488 viruses. These include putative orthologs among 448 metazoan, 117 plant, 549 fungal, 148 protist, 5609 bacterial, and 404 archaeal genomes, picking up the best sequenced and annotated representatives for each species or operational taxonomic unit. OrthoDB relies on a concept of hierarchy of levels-of-orthology to enable more finely resolved gene orthologies for more closely related species. Since orthologs are the most likely candidates to retain functions of their ancestor gene, OrthoDB is aimed at narrowing down hypotheses about gene functions and enabling comparative evolutionary studies. Optional registered-user sessions allow on-line BUSCO assessments of gene set completeness and mapping of the uploaded data to OrthoDB to enable further interactive exploration of related annotations and generation of comparative charts. The accelerating expansion of genomics data continues to add valuable information, and OrthoDB strives to provide orthologs from the broadest coverage of species, as well as to extensively collate available functional annotations and to compute evolutionary annotations. The data can be browsed online, downloaded or assessed via REST API or SPARQL RDF compatible with both UniProt and Ensembl.
INTRODUCTION Control of mosquito vectors has historically proven to be an effective means of eliminating malaria. Human malaria is transmitted only by mosquitoes in the genus Anopheles , but not all species within the genus, or even all members of each vector species, are efficient malaria vectors. Variation in vectorial capacity for human malaria among Anopheles mosquito species is determined by many factors, including behavior, immunity, and life history. RATIONALE This variation in vectorial capacity suggests an underlying genetic/genomic plasticity that results in variation of key traits determining vectorial capacity within the genus. Sequencing the genome of Anopheles gambiae , the most important malaria vector in sub-Saharan Africa, has offered numerous insights into how that species became highly specialized to live among and feed upon humans and how susceptibility to mosquito control strategies is determined. Until very recently, similar genomic resources have not existed for other anophelines, limiting comparisons to individual genes or sets of genomic markers with no genome-wide data to investigate attributes associated with vectorial capacity across the genus. RESULTS We sequenced and assembled the genomes and transcriptomes of 16 anophelines from Africa, Asia, Europe, and Latin America, spanning ~100 million years of evolution and chosen to represent a range of evolutionary distances from An. gambiae , a variety of geographic locations and ecological conditions, and varying degrees of vectorial capacity. Genome assembly quality reflected DNA template quality and homozygosity. Despite variation in contiguity, the assemblies were remarkably complete and searches for arthropod-wide single-copy orthologs generally revealed few missing genes. Genome annotation supported with RNA sequencing transcriptomes yielded between 10,738 and 16,149 protein-coding genes for each species. Relative to Drosophila, the closest dipteran genus for which equivalent genomic resources exist, Anopheles exhibits a dynamic genomic evolutionary profile. Comparative analyses show a fivefold faster rate of gene gain and loss, elevated gene shuffling on the X chromosome, and more intron losses in Anopheles . Some determinants of vectorial capacity, such as chemosensory genes, do not show elevated turnover but instead diversify through protein-sequence changes. We also document evidence of variation in important reproductive phenotypes, genes controlling immunity to Plasmodium malaria parasites and other microbes, genes encoding cuticular and salivary proteins, and genes conferring metabolic insecticide resistance. This dynamism of anopheline genes and genomes may contribute to their flexible capacity to take advantage of new ecological niches, including adapting to humans as primary hosts. CONCLUSIONS Anopheline mosquitoes exhibit a molecular evolutionary profile very distinct from Drosophila , and their genomes harbor strong evidence of functional variation in traits that determine vectorial capacity. These 16 new reference genome assemblies provide a foundation for hypothesis generation and testing to further our understanding of the diverse biological traits that determine vectorial capacity. Geography, vector status, and molecular phylogeny of the 16 newly sequenced anopheline mosquitoes and selected other dipterans. The maximum likelihood molecular phylogeny of all sequenced anophelines and two mosquito outgroups was constructed from the aligned protein sequences of 1085 single-copy orthologs. Shapes between branch termini and species names indicate vector status and are colored according to geographic ranges depicted on the map.
OrthoDB is a comprehensive catalog of orthologs, genes inherited by extant species from a single gene in their last common ancestor. In 2016 OrthoDB reached its 9th release, growing to over 22 million genes from over 5000 species, now adding plants, archaea and viruses. In this update we focused on usability of this fast-growing wealth of data: updating the user and programmatic interfaces to browse and query the data, and further enhancing the already extensive integration of available gene functional annotations. Collating functional annotations from over 100 resources, and enabled us to propose descriptive titles for 87% of ortholog groups. Additionally, OrthoDB continues to provide computed evolutionary annotations and to allow user queries by sequence homology. The OrthoDB resource now enables users to generate publication-quality comparative genomics charts, as well as to upload, analyze and interactively explore their own private data. OrthoDB is available from http://orthodb.org.