The Sorcerer II Global Ocean Sampling Expedition: Northwest Atlantic through Eastern Tropical PacificThe world's oceans contain a complex mixture of micro-organisms that are for the most part, uncharacterized both genetically and biochemically. We report here a metagenomic study of the marine planktonic microbiota in which surface (mostly marine) water samples were analyzed as part of the Sorcerer II Global Ocean Sampling expedition. These samples, collected across a several-thousand km transect from the North Atlantic through the Panama Canal and ending in the South Pacific yielded an extensive dataset consisting of 7.7 million sequencing reads (6.3 billion bp). Though a few major microbial clades dominate the planktonic marine niche, the dataset contains great diversity with 85% of the assembled sequence and 57% of the unassembled data being unique at a 98% sequence identity cutoff. Using the metadata associated with each sample and sequencing library, we developed new comparative genomic and assembly methods. One comparative genomic method, termed "fragment recruitment," addressed questions of genome structure, evolution, and taxonomic or phylogenetic diversity, as well as the biochemical diversity of genes and gene families. A second method, termed "extreme assembly," made possible the assembly and reconstruction of large segments of abundant but clearly nonclonal organisms. Within all abundant populations analyzed, we found extensive intra-ribotype diversity in several forms: (1) extensive sequence variation within orthologous regions throughout a given genome; despite coverage of individual ribotypes approaching 500-fold, most individual sequencing reads are unique; (2) numerous changes in gene content some with direct adaptive implications; and (3) hypervariable genomic islands that are too variable to assemble. The intra-ribotype diversity is organized into genetically isolated populations that have overlapping but independent distributions, implying distinct environmental preference. We present novel methods for measuring the genomic similarity between metagenomic samples and show how they may be grouped into several community types. Specific functional adaptations can be identified both within individual ribotypes and across the entire community, including proteorhodopsin spectral tuning and the presence or absence of the phosphate-binding gene PstS.
The Sorcerer II Global Ocean Sampling Expedition: Expanding the Universe of Protein FamiliesMetagenomics projects based on shotgun sequencing of populations of micro-organisms yield insight into protein families. We used sequence similarity clustering to explore proteins with a comprehensive dataset consisting of sequences from available databases together with 6.12 million proteins predicted from an assembly of 7.7 million Global Ocean Sampling (GOS) sequences. The GOS dataset covers nearly all known prokaryotic protein families. A total of 3,995 medium- and large-sized clusters consisting of only GOS sequences are identified, out of which 1,700 have no detectable homology to known families. The GOS-only clusters contain a higher than expected proportion of sequences of viral origin, thus reflecting a poor sampling of viral diversity until now. Protein domain distributions in the GOS dataset and current protein databases show distinct biases. Several protein domains that were previously categorized as kingdom specific are shown to have GOS examples in other kingdoms. About 6,000 sequences (ORFans) from the literature that heretofore lacked similarity to known proteins have matches in the GOS data. The GOS dataset is also used to improve remote homology detection. Overall, besides nearly doubling the number of current proteins, the predicted GOS proteins also add a great deal of diversity to known protein families and shed light on their evolution. These observations are illustrated using several protein families, including phosphatases, proteases, ultraviolet-irradiation DNA damage repair enzymes, glutamine synthetase, and RuBisCO. The diversity added by GOS data has implications for choosing targets for experimental structure characterization as part of structural genomics efforts. Our analysis indicates that new families are being discovered at a rate that is linear or almost linear with the addition of new sequences, implying that we are still far from discovering all protein families in nature.
Genomic and functional adaptation in surface ocean planktonic prokaryotesThe understanding of marine microbial ecology and metabolism has been hampered by the paucity of sequenced reference genomes. To this end, we report the sequencing of 137 diverse marine isolates collected from around the world. We analysed these sequences, along with previously published marine prokaryotic genomes, in the context of marine metagenomic data, to gain insights into the ecology of the surface ocean prokaryotic picoplankton (0.1–3.0 μm size range). The results suggest that the sequenced genomes define two microbial groups: one composed of only a few taxa that are nearly always abundant in picoplanktonic communities, and the other consisting of many microbial taxa that are rarely abundant. The genomic content of the second group suggests that these microbes are capable of slow growth and survival in energy-limited environments, and rapid growth in energy-rich environments. By contrast, the abundant and cosmopolitan picoplanktonic prokaryotes for which there is genomic representation have smaller genomes, are probably capable of only slow growth and seem to be relatively unable to sense or rapidly acclimate to energy-rich conditions. Their genomic features also lead us to propose that one method used to avoid predation by viruses and/or bacterivores is by means of slow growth and the maintenance of low biomass. Using newly derived genome sequences of 137 microbial isolates collected from a variety of marine environments around the world, together with previously obtained genome and metagenome data, Shibu Yooseph and colleagues have obtained an overview of the ecology of the ocean surface-dwelling plankton community. Two main microbial groups emerge. The first contains many microbial taxa that are rarely abundant and seem to be adapted to a 'feast or famine' lifestyle of rapid growth in energy-rich environments and slow growth during food scarcity. The second group consists of a few taxa of abundant and cosmopolitan plankton that are usually always plentiful. These largely uncultured microbes have relatively small genomes and may avoid predation by growing slowly and maintaining low biomass. Using newly derived genome sequences of 137 marine microbial isolates as well as previously obtained genome and metagenome data, this study presents a functional analysis of picoplankton residing in the ocean's surface layer.
The Sorcerer II Global Ocean Sampling Expedition: Metagenomic Characterization of Viruses within Aquatic Microbial SamplesViruses are the most abundant biological entities on our planet. Interactions between viruses and their hosts impact several important biological processes in the world's oceans such as horizontal gene transfer, microbial diversity and biogeochemical cycling. Interrogation of microbial metagenomic sequence data collected as part of the Sorcerer II Global Ocean Expedition (GOS) revealed a high abundance of viral sequences, representing approximately 3% of the total predicted proteins. Cluster analyses of the viral sequences revealed hundreds to thousands of viral genes encoding various metabolic and cellular functions. Quantitative analyses of viral genes of host origin performed on the viral fraction of aquatic samples confirmed the viral nature of these sequences and suggested that significant portions of aquatic viral communities behave as reservoirs of such genetic material. Distributional and phylogenetic analyses of these host-derived viral sequences also suggested that viral acquisition of environmentally relevant genes of host origin is a more abundant and widespread phenomenon than previously appreciated. The predominant viral sequences identified within microbial fractions originated from tailed bacteriophages and exhibited varying global distributions according to viral family. Recruitment of GOS viral sequence fragments against 27 complete aquatic viral genomes revealed that only one reference bacteriophage genome was highly abundant and was closely related, but not identical, to the cyanomyovirus P-SSM4. The co-distribution across all sampling sites of P-SSM4-like sequences with the dominant ecotype of its host, Prochlorococcus supports the classification of the viral sequences as P-SSM4-like and suggests that this virus may influence the abundance, distribution and diversity of one of the most dominant components of picophytoplankton in oligotrophic oceans. In summary, the abundance and broad geographical distribution of viral sequences within microbial fractions, the prevalence of genes among viral sequences that encode microbial physiological function and their distinct phylogenetic distribution lend strong support to the notion that viral-mediated gene acquisition is a common and ongoing mechanism for generating microbial diversity in the marine environment.
Seasonal Variation in Lysogeny as Depicted by Prophage Induction in Tampa Bay, FloridaShannon J. Williamson, L. A. Houchin, Lauren D. McDaniel et al.|Applied and Environmental Microbiology|2002 A seasonal study of the distribution of lysogenic bacteria in Tampa Bay, Florida, was conducted over a 13-month period. Biweekly water samples were collected and either were left unaltered or had the viral population reduced by filtration (pore size, 0.2 micro m) and resuspension in filtered (pore size, 0.2 micro m) water. Virus-reduced and unaltered samples were then treated by adding mitomycin C (0.5 micro g ml(-1)) to induce prophage or were left untreated. In order to test the hypothesis that prophage induction was phosphate limited, additional induction experiments were performed in the presence and absence of phosphate. Induction was assessed as an increase in viral direct counts, relative to those obtained in controls, as detected by epifluorescence microscopy. Induction of prophage was observed in 5 of 25 (20%) unaltered samples which were obtained during or after the month of February, paralleling the results from a previous seasonal study. Induction of prophage was observed in 9 of 25 (36%) of the virus-reduced samples, primarily those obtained in the winter months, which was not observed in a prior seasonal study (P. K. Cochran and J. H. Paul, Appl. Environ. Microbiol. 64:2308-2312, 1998). Induction was noted in the months of lowest bacterial and primary production, suggesting that lysogeny was favored under conditions of poor host growth. Phosphate addition enabled prophage induction in two of nine (22%) experiments. These results indicate that prophage induction may occasionally be phosphate limited or respond to increases in phosphate concentration, suggesting that phosphate concentration may modulate the lysogenic response of natural populations.