Comprehensive genome annotation of the model ciliate <i>Tetrahymena thermophila</i> by in-depth epigenetic and transcriptomic profiling

Fei Ye(Qingdao National Laboratory for Marine Science and Technology), Xiao Chen(Shandong University), Yuan Li(Qingdao National Laboratory for Marine Science and Technology), Aili Ju(Qingdao National Laboratory for Marine Science and Technology), Yalan Sheng(Hong Kong Baptist University), Lili Duan(Qingdao National Laboratory for Marine Science and Technology), Jiachen Zhang(Qingdao National Laboratory for Marine Science and Technology), Zhe Zhang(Qingdao National Laboratory for Marine Science and Technology), Khaled A. S. Al‐Rasheid(King Saud University), Naomi A. Stover(Bradley University), Shan Gao(Qingdao National Laboratory for Marine Science and Technology)
Nucleic Acids Research
December 9, 2024
Cited by 25Open Access
Full Text

Abstract

The ciliate Tetrahymena thermophila is a well-established unicellular model eukaryote, contributing significantly to foundational biological discoveries. Despite its acknowledged importance, current studies on Tetrahymena biology face challenges due to gene annotation inaccuracy, particularly the notable absence of untranslated regions (UTRs). To comprehensively annotate the Tetrahymena macronuclear genome, we collected extensive transcriptomic data spanning various cell stages. To ascertain transcript orientation and transcription start/end sites, we incorporated data on epigenetic marks displaying enrichment towards the 5' end of gene bodies, including H3 lysine 4 tri-methylation (H3K4me3), histone variant H2A.Z, nucleosome positioning and N6-methyldeoxyadenine (6mA). Cap-seq data was subsequently applied to validate the accuracy of identified transcription start sites. Additionally, we integrated Nanopore direct RNA sequencing (DRS), strand-specific RNA sequencing (RNA-seq) and assay for transposase-accessible chromatin with high-throughput sequencing (ATAC-seq) data. Using a newly developed bioinformatic pipeline, coupled with manual curation and experimental validation, our work yielded substantial improvements to the current gene models, including the addition of 2,481 new genes, updates to 23,936 existing genes, and the incorporation of 8,339 alternatively spliced isoforms. Furthermore, novel UTR information was annotated for 26,687 high-confidence genes. Intriguingly, 20% of protein-coding genes were identified to have natural antisense transcripts characterized by high diversity in alternative splicing, thus offering insights into understanding transcriptional regulation. Our work will enhance the utility of Tetrahymena as a robust genetic toolkit for advancing biological research, and provides a promising framework for genome annotation in other eukaryotes.


Related Papers