Integrative Analysis of the <i>Caenorhabditis elegans</i> Genome by the modENCODE ProjectWe systematically generated large-scale data sets to improve genome annotation for the nematode Caenorhabditis elegans, a key model organism. These data sets include transcriptome profiling across a developmental time course, genome-wide identification of transcription factor-binding sites, and maps of chromatin organization. From this, we created more complete and accurate gene models, including alternative splice forms and candidate noncoding RNAs. We constructed hierarchical networks of transcription factor-binding and microRNA interactions and discovered chromosomal locations bound by an unusually large number of transcription factors. Different patterns of chromatin composition and histone modification were revealed between chromosome arms and centers, with similarly prominent differences between autosomes and the X chromosome. Integrating data types, we built statistical models relating chromatin, transcription factor binding, and gene expression. Overall, our analyses ascribed putative functions to most of the conserved genome.
What is a gene, post-ENCODE? History and updated definitionWhile sequencing of the human genome surprised us with how many protein-coding genes there are, it did not fundamentally change our perspective on what a gene is. In contrast, the complex patterns of dispersed regulation and pervasive transcription uncovered by the ENCODE project, together with non-genic conservation and the abundance of noncoding RNA genes, have challenged the notion of the gene. To illustrate this, we review the evolution of operational definitions of a gene over the past century--from the abstract elements of heredity of Mendel and Morgan to the present-day ORFs enumerated in the sequence databanks. We then summarize the current ENCODE findings and provide a computational metaphor for the complexity. Finally, we propose a tentative update to the definition of a gene: A gene is a union of genomic sequences encoding a coherent set of potentially overlapping functional products. Our definition side-steps the complexities of regulation and transcription by removing the former altogether from the definition and arguing that final, functional gene products (rather than intermediate transcripts) should be used to group together entities associated with a single gene. It also manifests how integral the concept of biological function is in defining genes.
Deciphering the bat virome catalog to better understand the ecological diversity of bat viruses and the bat origin of emerging infectious diseasesZhìqiáng Wú, Li Yang, Xianwen Ren et al.|The ISME Journal|2015 Studies have demonstrated that ~60%-80% of emerging infectious diseases (EIDs) in humans originated from wild life. Bats are natural reservoirs of a large variety of viruses, including many important zoonotic viruses that cause severe diseases in humans and domestic animals. However, the understanding of the viral population and the ecological diversity residing in bat populations is unclear, which complicates the determination of the origins of certain EIDs. Here, using bats as a typical wildlife reservoir model, virome analysis was conducted based on pharyngeal and anal swab samples of 4440 bat individuals of 40 major bat species throughout China. The purpose of this study was to survey the ecological and biological diversities of viruses residing in these bat species, to investigate the presence of potential bat-borne zoonotic viruses and to evaluate the impacts of these viruses on public health. The data obtained in this study revealed an overview of the viral community present in these bat samples. Many novel bat viruses were reported for the first time and some bat viruses closely related to known human or animal pathogens were identified. This genetic evidence provides new clues in the search for the origin or evolution pattern of certain viruses, such as coronaviruses and noroviruses. These data offer meaningful ecological information for predicting and tracing wildlife-originated EIDs.
Virome Analysis for Identification of Novel Mammalian Viruses in Bat Species from Chinese ProvincesZhìqiáng Wú, Xianwen Ren, Li Yang et al.|Journal of Virology|2012 Bats are natural hosts for a large variety of zoonotic viruses. This study aimed to describe the range of bat viromes, including viruses from mammals, insects, fungi, plants, and phages, in 11 insectivorous bat species (216 bats in total) common in six provinces of China. To analyze viromes, we used sequence-independent PCR amplification and next-generation sequencing technology (Solexa Genome Analyzer II; Illumina). The viromes were identified by sequence similarity comparisons to known viruses. The mammalian viruses included those of the Adenoviridae, Herpesviridae, Papillomaviridae, Retroviridae, Circoviridae, Rhabdoviridae, Astroviridae, Flaviridae, Coronaviridae, Picornaviridae, and Parvovirinae; insect viruses included those of the Baculoviridae, Iflaviridae, Dicistroviridae, Tetraviridae, and Densovirinae; fungal viruses included those of the Chrysoviridae, Hypoviridae, Partitiviridae, and Totiviridae; and phages included those of the Caudovirales, Inoviridae, and Microviridae and unclassified phages. In addition to the viruses and phages associated with the insects, plants, and bacterial flora related to the diet and habitation of bats, we identified the complete or partial genome sequences of 13 novel mammalian viruses. These included herpesviruses, papillomaviruses, a circovirus, a bocavirus, picornaviruses, a pestivirus, and a foamy virus. Pairwise alignments and phylogenetic analyses indicated that these novel viruses showed little genetic similarity with previously reported viruses. This study also revealed a high prevalence and diversity of bat astroviruses and coronaviruses in some provinces. These findings have expanded our understanding of the viromes of bats in China and hinted at the presence of a large variety of unknown mammalian viruses in many common bat species of mainland China.
Comparative analysis of rodent and small mammal viromes to better understand the wildlife origin of emerging infectious diseasesZhìqiáng Wú, Liang Lu, Jiang Du et al.|Microbiome|2018 BACKGROUND: Rodents represent around 43% of all mammalian species, are widely distributed, and are the natural reservoirs of a diverse group of zoonotic viruses, including hantaviruses, Lassa viruses, and tick-borne encephalitis viruses. Thus, analyzing the viral diversity harbored by rodents could assist efforts to predict and reduce the risk of future emergence of zoonotic viral diseases. RESULTS: We used next-generation sequencing metagenomic analysis to survey for a range of mammalian viral families in rodents and other small animals of the orders Rodentia, Lagomorpha, and Soricomorpha in China. We sampled 3,055 small animals from 20 provinces and then outlined the spectra of mammalian viruses within these individuals and the basic ecological and genetic characteristics of novel rodent and shrew viruses among the viral spectra. Further analysis revealed that host taxonomy plays a primary role and geographical location plays a secondary role in determining viral diversity. Many viruses were reported for the first time with distinct evolutionary lineages, and viruses related to known human or animal pathogens were identified. Phylogram comparison between viruses and hosts indicated that host shifts commonly happened in many different species during viral evolutionary history. CONCLUSIONS: These results expand our understanding of the viromes of rodents and insectivores in China and suggest that there is high diversity of viruses awaiting discovery in these species in Asia. These findings, combined with our previous bat virome data, greatly increase our knowledge of the viral community in wildlife in a densely populated country in an emerging disease hotspot.