LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposonsBACKGROUND: Transposable elements are abundant in eukaryotic genomes and it is believed that they have a significant impact on the evolution of gene and chromosome structure. While there are several completed eukaryotic genome projects, there are only few high quality genome wide annotations of transposable elements. Therefore, there is a considerable demand for computational identification of transposable elements. LTR retrotransposons, an important subclass of transposable elements, are well suited for computational identification, as they contain long terminal repeats (LTRs). RESULTS: We have developed a software tool LTRharvest for the de novo detection of full length LTR retrotransposons in large sequence sets. LTRharvest efficiently delivers high quality annotations based on known LTR transposon features like length, distance, and sequence motifs. A quality validation of LTRharvest against a gold standard annotation for Saccharomyces cerevisae and Drosophila melanogaster shows a sensitivity of up to 90% and 97% and specificity of 100% and 72%, respectively. This is comparable or slightly better than annotations for previous software tools. The main advantage of LTRharvest over previous tools is (a) its ability to efficiently handle large datasets from finished or unfinished genome projects, (b) its flexibility in incorporating known sequence features into the prediction, and (c) its availability as an open source software. CONCLUSION: LTRharvest is an efficient software tool delivering high quality annotation of LTR retrotransposons. It can, for example, process the largest human chromosome in approx. 8 minutes on a Linux PC with 4 GB of memory. Its flexibility and small space and run-time requirements makes LTRharvest a very competitive candidate for future LTR retrotransposon annotation projects. Moreover, the structured design and implementation and the availability as open source provides an excellent base for incorporating novel concepts to further improve prediction of LTR retrotransposons.
The genome of the protist parasite Entamoeba histolyticaThe genome sequence of the pathogen Entamoeba histolytica is reported this week. E. histolytica causes amoebiasis, the second most deadly protozoan disease after malaria. The genome contains adaptations shared with other anaerobic pathogens such as Trichomonas and Giardia. And there is evidence that the genome has been shaped by many gene transfers from bacteria, which may suggest possible targets for drugs against these organisms. The identification of a large number of sensing and signalling proteins challenges the idea that E. histolytica is a simple organism: in fact it is finely attuned to its environment. Entamoeba histolytica is an intestinal parasite and the causative agent of amoebiasis, which is a significant source of morbidity and mortality in developing countries1. Here we present the genome of E. histolytica, which reveals a variety of metabolic adaptations shared with two other amitochondrial protist pathogens: Giardia lamblia and Trichomonas vaginalis. These adaptations include reduction or elimination of most mitochondrial metabolic pathways and the use of oxidative stress enzymes generally associated with anaerobic prokaryotes. Phylogenomic analysis identifies evidence for lateral gene transfer of bacterial genes into the E. histolytica genome, the effects of which centre on expanding aspects of E. histolytica's metabolic repertoire. The presence of these genes and the potential for novel metabolic pathways in E. histolytica may allow for the development of new chemotherapeutic agents. The genome encodes a large number of novel receptor kinases and contains expansions of a variety of gene families, including those associated with virulence. Additional genome features include an abundance of tandemly repeated transfer-RNA-containing arrays, which may have a structural function in the genome. Analysis of the genome provides new insights into the workings and genome evolution of a major human pathogen.
Fine-grained annotation and classification of de novo predicted LTR retrotransposonsLong terminal repeat (LTR) retrotransposons and endogenous retroviruses (ERVs) are transposable elements in eukaryotic genomes well suited for computational identification. De novo identification tools determine the position of potential LTR retrotransposon or ERV insertions in genomic sequences. For further analysis, it is desirable to obtain an annotation of the internal structure of such candidates. This article presents LTRdigest, a novel software tool for automated annotation of internal features of putative LTR retrotransposons. It uses local alignment and hidden Markov model-based algorithms to detect retrotransposon-associated protein domains as well as primer binding sites and polypurine tracts. As an example, we used LTRdigest results to identify 88 (near) full-length ERVs in the chromosome 4 sequence of Mus musculus, separating them from truncated insertions and other repeats. Furthermore, we propose a work flow for the use of LTRdigest in de novo LTR retrotransposon classification and perform an exemplary de novo analysis on the Drosophila melanogaster genome as a proof of concept. Using a new method solely based on the annotations generated by LTRdigest, 518 potential LTR retrotransposons were automatically assigned to 62 candidate groups. Representative sequences from 41 of these 62 groups were matched to reference sequences with >80% global sequence similarity.
Identification of the Sex-Determining Region of the <i>Ceratitis capitata Y</i> Chromosome by Deletion MappingIn the medfly Ceratitis capitata, the Y chromosome is responsible for determining the male sex. We have mapped the region containing the relevant factor through the analysis of Y-autosome translocations using fluorescence in situ hybridization with two different probes. One probe, the clone pY114, contains repetitive, Y-specific DNA sequences from C. capitata, while the second clone, pDh2-H8, consists of ribosomal DNA sequences from Drosophila hydei. Clone pY114 labeled most of the long arm and pDh2-H8 hybridizes to the short arm and the centromeric region of the long arm. In 12 of the analyzed 19 Y-autosome translocation strains, adjacent-1 segregation products survive to the late pupal or even adult stage and can, therefore, be sexed. This was correlated with the length of the Y fragment still present in these aberrant individuals and allowed us to map the male-determining factor to a region of the long arm representing approximately 15% of the entire Y chromosome. No additional factors, affecting for example fertility, were detected outside the male-determining region.
Analysis of 5′ junctions of human LINE-1 and <i>Alu</i> retrotransposons suggests an alternative model for 5′-end attachment requiring microhomology-mediated end-joiningInsertion of the human non-LTR retrotransposon LINE-1 (L1) into chromosomal DNA is thought to be initiated by a mechanism called target-primed reverse transcription (TPRT). This mechanism readily accounts for the attachment of the 3'-end of an L1 copy to the genomic target, but the subsequent integration steps leading to the attachment of the 5'-end to the chromosomal DNA are still cause for speculation. By applying bioinformatics to analyze the 5' junctions of recent L1 insertions in the human genome, we provide evidence that L1 uses at least two distinct mechanisms to link the 5'-end of the nascent L1 copy to its genomic target. While 5'-truncated L1 elements show a statistically significant preference for short patches of overlapping nucleotides between their target site and the point of truncation, full-length insertions display no distinct bias for such microhomologies at their 5'-ends. In a second genome-wide approach, we analyzed Alu elements to examine whether these nonautonomous retrotransposons, which are thought to be mobilized through L1 proteins, show similar characteristics. We found that Alu elements appear to be predominantly integrated via a pathway not involving overlapping nucleotides. The results indicate that a cellular nonhomologous DNA end-joining pathway may resolve intermediates from incomplete L1 retrotransposition events and thus lead to 5' truncations.