Prophyle: A Phylogeny-Based Metagenomic Classifier Using The Burrows-Wheeler Transform

Karel Břinda(Dana-Farber/Harvard Cancer Center), Kamil Salikhov(Paris-Est Sup), Simone Pignotti(Paris-Est Sup), Grégory Kucherov(Centre National de la Recherche Scientifique)
Zenodo (CERN European Organization for Nuclear Research)
July 24, 2017
Cited by 4Open Access
Full Text

Abstract

Metagenomics is a powerful approach to study genetic content of environmental samples and it has been strongly promoted by Next-Generation Sequencing technologies. The aim of metagenomic classification is to assign each sequence of the metagenome to a corresponding taxonomic unit, or to classify it as “novel”. To cope with increasingly large metagenomic projects, researchers resort to alignment-free methods. The most popular tool – Kraken – provides an extremely rapid read classification, but its index suffers from two major limitations: an enormous memory consumption and a lossy <em>k</em>-mer representation through their lowest common ancestors. We present Prophyle, a metagenomic classifier based on the Burrows-Wheeler Transform. ProPhyle uses a classification algorithm similar to Kraken but with an indexing strategy based on a bottom-up propagation of <em>k</em>-mers in the tree, assembling contigs at each node and matching using a standard full-text search. The obtained index occupies only a fraction of RAM compared to Kraken – 13 GB instead of 90 GB for index construction and 14 GB instead of 72 GB for index querying. The resulting index is also more expressive, allowing users to retrieve a list of <em>all</em> genomes for every queried <em>k</em>-mer. Overall, ProPhyle provides an index for resource-frugal metagenomic classification, which is accurate even with single-species phylogenetic trees. Prophyle is available at http://github.com/karel-brinda/prophyle, released under the MIT license.


Related Papers

No related papers found

Powered by citation graph analysis