SeqScreen: accurate and sensitive functional screening of pathogenic sequences via ensemble learning

Advait Balaji(Rice University), Bryce Kille(Rice University), Anthony D. Kappell(Signature Research (United States)), Gene D. Godbold(Signature Research (United States)), Madeline Diep(Fraunhofer USA Center Mid-Atlantic CMA), R. A. Leo Elworth(Rice University), Zhiqin Qian(Rice University), Dreycey Albin(Rice University), Daniel J. Nasko(University of Maryland, College Park), Nidhi Shah(University of Maryland, College Park), Mihai Pop(University of Maryland, College Park), Santiago Segarra(Rice University), Krista L. Ternus(Signature Research (United States)), Todd J. Treangen(Rice University)
Genome biology
June 20, 2022
Cited by 33Open Access
Full Text

Abstract

The COVID-19 pandemic has emphasized the importance of accurate detection of known and emerging pathogens. However, robust characterization of pathogenic sequences remains an open challenge. To address this need we developed SeqScreen, which accurately characterizes short nucleotide sequences using taxonomic and functional labels and a customized set of curated Functions of Sequences of Concern (FunSoCs) specific to microbial pathogenesis. We show our ensemble machine learning model can label protein-coding sequences with FunSoCs with high recall and precision. SeqScreen is a step towards a novel paradigm of functionally informed synthetic DNA screening and pathogen characterization, available for download at www.gitlab.com/treangenlab/seqscreen .


Related Papers

No related papers found

Powered by citation graph analysis