Potentially adaptive SARS-CoV-2 mutations discovered with novel spatiotemporal and explainable AI models

Michael R. Garvin(Oak Ridge National Laboratory), Érica T. Prates(Oak Ridge National Laboratory), Mirko Pavicic(Oak Ridge National Laboratory), Piet Jones(Oak Ridge National Laboratory), B Kirtley Amos(Oak Ridge National Laboratory), Armin Geiger(Oak Ridge National Laboratory), Manesh Shah(Oak Ridge National Laboratory), Jared Streich(Oak Ridge National Laboratory), João Gabriel Felipe Machado Gazolla(Oak Ridge National Laboratory), David Kainer(Oak Ridge National Laboratory), Ashley Cliff(Oak Ridge National Laboratory), Jonathon Romero(Oak Ridge National Laboratory), Nathan Keith(Lawrence Berkeley National Laboratory), James B. Brown(Lawrence Berkeley National Laboratory), Daniel Jacobson(Oak Ridge National Laboratory)
Genome biology
December 1, 2020
Cited by 71Open Access
Full Text

Abstract

Abstract Background A mechanistic understanding of the spread of SARS-CoV-2 and diligent tracking of ongoing mutagenesis are of key importance to plan robust strategies for confining its transmission. Large numbers of available sequences and their dates of transmission provide an unprecedented opportunity to analyze evolutionary adaptation in novel ways. Addition of high-resolution structural information can reveal the functional basis of these processes at the molecular level. Integrated systems biology-directed analyses of these data layers afford valuable insights to build a global understanding of the COVID-19 pandemic. Results Here we identify globally distributed haplotypes from 15,789 SARS-CoV-2 genomes and model their success based on their duration, dispersal, and frequency in the host population. Our models identify mutations that are likely compensatory adaptive changes that allowed for rapid expansion of the virus. Functional predictions from structural analyses indicate that, contrary to previous reports, the Asp 614 Gly mutation in the spike glycoprotein (S) likely reduced transmission and the subsequent Pro 323 Leu mutation in the RNA-dependent RNA polymerase led to the precipitous spread of the virus. Our model also suggests that two mutations in the nsp13 helicase allowed for the adaptation of the virus to the Pacific Northwest of the USA. Finally, our explainable artificial intelligence algorithm identified a mutational hotspot in the sequence of S that also displays a signature of positive selection and may have implications for tissue or cell-specific expression of the virus. Conclusions These results provide valuable insights for the development of drugs and surveillance strategies to combat the current and future pandemics.


Related Papers

No related papers found

Powered by citation graph analysis