Petabase-scale sequence alignment catalyses viral discovery

R. C. Edgar, Brie Taylor, Victor S.-Y. Lin, Tomer Altman(Verisk Analytics (United States)), Pierre Barbera(Heidelberg Institute for Theoretical Studies), Dmitry Meleshko(St Petersburg University), Dan Lohr(AID Atlanta), Gherman Novakovsky(University of British Columbia), Benjamin Buchfink(Max Planck Institute for Biology), Basem Al-Shayeb(University of California, Berkeley), Jillian F. Banfield(Planetary Science Institute), Marcos de la Peña(Instituto de Biología Molecular y Celular de Plantas), Anton Korobeynikov(St Petersburg University), Rayan Chikhi(Institut Pasteur), Artem Babaian(Oldham Council)
Nature
January 26, 2022
Cited by 522Open Access
Full Text

Abstract

Public databases contain a planetary collection of nucleic acid sequences, but their systematic exploration has been inhibited by a lack of efficient methods for searching this corpus, which (at the time of writing) exceeds 20 petabases and is growing exponentially1. Here we developed a cloud computing infrastructure, Serratus, to enable ultra-high-throughput sequence alignment at the petabase scale. We searched 5.7 million biologically diverse samples (10.2 petabases) for the hallmark gene RNA-dependent RNA polymerase and identified well over 105 novel RNA viruses, thereby expanding the number of known species by roughly an order of magnitude. We characterized novel viruses related to coronaviruses, hepatitis delta virus and huge phages, respectively, and analysed their environmental reservoirs. To catalyse the ongoing revolution of viral discovery, we established a free and comprehensive database of these data and tools. Expanding the known sequence diversity of viruses can reveal the evolutionary origins of emerging pathogens and improve pathogen surveillance for the anticipation and mitigation of future pandemics. Serratus, an open-source cloud-computing infrastructure, can be used to screen millions of nucleic acid sequencing libraries at the petabase scale, and has enabled many new RNA viruses to be identified efficiently.


Related Papers

No related papers found

Powered by citation graph analysis