Improved software detection and extraction of ITS1 and <scp>ITS</scp> 2 from ribosomal <scp>ITS</scp> sequences of fungi and other eukaryotes for analysis of environmental sequencing data

Johan Bengtsson‐Palme(University of Gothenburg), Martin Ryberg(Uppsala University), Martin Hartmann(Swiss Federal Institute for Forest, Snow and Landscape Research), Sara Branco(University of California, Berkeley), Zheng Wang(Yale University), Anna Godhe(University of Gothenburg), Pierre De Wit(University of Gothenburg), Marisol Sánchez‐García(University of Tennessee at Knoxville), Ingo Ebersberger(Goethe University Frankfurt), Filipe Sousa(University of Gothenburg), Anthony S. Amend(University of Hawaiʻi at Mānoa), Ari Jumpponen(Kansas State University), Martin Unterseher(Universität Greifswald), Erik Kristiansson(Chalmers University of Technology), Kessy Abarenkov(University of Tartu Natural History Museum and Botanical Garden), Yann Bertrand(University of Gothenburg), Kemal Sanli(University of Gothenburg), K. Martin Eriksson(Chalmers University of Technology), Unni Vik(University of Oslo), Vilmar Veldre, R. Henrik Nilsson(University of Gothenburg)
Methods in Ecology and Evolution
July 19, 2013
Cited by 1,399Open Access
Full Text

Abstract

Summary The nuclear ribosomal internal transcribed spacer ( ITS ) region is the primary choice for molecular identification of fungi. Its two highly variable spacers ( ITS 1 and ITS 2) are usually species specific, whereas the intercalary 5.8S gene is highly conserved. For sequence clustering and blast searches, it is often advantageous to rely on either one of the variable spacers but not the conserved 5.8S gene. To identify and extract ITS 1 and ITS 2 from large taxonomic and environmental data sets is, however, often difficult, and many ITS sequences are incorrectly delimited in the public sequence databases. We introduce ITS x, a Perl‐based software tool to extract ITS 1, 5.8S and ITS 2 – as well as full‐length ITS sequences – from both Sanger and high‐throughput sequencing data sets. ITS x uses hidden Markov models computed from large alignments of a total of 20 groups of eukaryotes, including fungi, metazoans and plants, and the sequence extraction is based on the predicted positions of the ribosomal genes in the sequences. ITS x has a very high proportion of true‐positive extractions and a low proportion of false‐positive extractions. Additionally, process parallelization permits expedient analyses of very large data sets, such as a one million sequence amplicon pyrosequencing data set. ITS x is rich in features and written to be easily incorporated into automated sequence analysis pipelines. ITS x paves the way for more sensitive blast searches and sequence clustering operations for the ITS region in eukaryotes. The software also permits elimination of non‐ ITS sequences from any data set. This is particularly useful for amplicon‐based next‐generation sequencing data sets, where insidious non‐target sequences are often found among the target sequences. Such non‐target sequences are difficult to find by other means and would contribute noise to diversity estimates if left in the data set.


Related Papers

No related papers found

Powered by citation graph analysis