Determining the quality and complexity of next-generation sequencing data without a reference genome

Seyed Yahya Anvar(Leiden University Medical Center), Lusine Khachatryan(Leiden University Medical Center), Martijn Vermaat(Leiden University Medical Center), Michiel van Galen(Leiden University Medical Center), Irina Pulyakhina(Leiden University Medical Center), Yavuz Ariyürek(Leiden University Medical Center), Ken Kraaijeveld(Leiden University Medical Center), Johan T. den Dunnen(Leiden University Medical Center), Peter de Knijff(Leiden University Medical Center), Peter A.C. ’t Hoen(Leiden University Medical Center), Jeroen F. J. Laros(Leiden University Medical Center)
Genome biology
December 16, 2014
Cited by 33Open Access
Full Text

Abstract

We describe an open-source kPAL package that facilitates an alignment-free assessment of the quality and comparability of sequencing datasets by analyzing k-mer frequencies. We show that kPAL can detect technical artefacts such as high duplication rates, library chimeras, contamination and differences in library preparation protocols. kPAL also successfully captures the complexity and diversity of microbiomes and provides a powerful means to study changes in microbial communities. Together, these features make kPAL an attractive and broadly applicable tool to determine the quality and comparability of sequence libraries even in the absence of a reference sequence. kPAL is freely available at https://github.com/LUMC/kPAL webcite.


Related Papers

No related papers found

Powered by citation graph analysis