Comparative Genomic Analyses of Seventeen <i>Streptococcus pneumoniae</i> Strains: Insights into the Pneumococcal Supragenome

N. Luisa Hiller(Allegheny General Hospital), Benjamin Janto(Allegheny General Hospital), Justin Hogg(Allegheny General Hospital), Robert Boissy(Allegheny General Hospital), Susan Yu(Allegheny General Hospital), Evan Powell(Allegheny General Hospital), Randy Keefe(Allegheny General Hospital), Nathan Ehrlich(Allegheny General Hospital), Kai Shen(Allegheny General Hospital), Jay Hayes(Allegheny General Hospital), Karen A. Barbadora(Children's Hospital of Pittsburgh), William Klimke(National Institutes of Health), Dmitry Dernovoy(National Institutes of Health), Tatiana Tatusova(National Institutes of Health), Julian Parkhill(Wellcome Sanger Institute), Stephen D. Bentley(Wellcome Sanger Institute), J. Christopher Post(Allegheny General Hospital), Garth D. Ehrlich(Allegheny General Hospital), Fen Hu(Allegheny General Hospital)
Journal of Bacteriology
August 4, 2007
Cited by 301Open Access
Full Text

Abstract

The distributed-genome hypothesis (DGH) states that pathogenic bacteria possess a supragenome that is much larger than the genome of any single bacterium and that these pathogens utilize genetic recombination and a large, noncore set of genes as a means of diversity generation. We sequenced the genomes of eight nasopharyngeal strains of Streptococcus pneumoniae isolated from pediatric patients with upper respiratory symptoms and performed quantitative genomic analyses among these and nine publicly available pneumococcal strains. Coding sequences from all strains were grouped into 3,170 orthologous gene clusters, of which 1,454 (46%) were conserved among all 17 strains. The majority of the gene clusters, 1,716 (54%), were not found in all strains. Genic differences per strain pair ranged from 35 to 629 orthologous clusters, with each strain's genome containing between 21 and 32% noncore genes. The distribution of the orthologous clusters per genome for the 17 strains was entered into the finite-supragenome model, which predicted that (i) the S. pneumoniae supragenome contains more than 5,000 orthologous clusters and (ii) 99% of the orthologous clusters ( approximately 3,000) that are represented in the S. pneumoniae population at frequencies of >or=0.1 can be identified if 33 representative genomes are sequenced. These extensive genic diversity data support the DGH and provide a basis for understanding the great differences in clinical phenotype associated with various pneumococcal strains. When these findings are taken together with previous studies that demonstrated the presence of a supragenome for Streptococcus agalactiae and Haemophilus influenzae, it appears that the possession of a distributed genome is a common host interaction strategy.


Related Papers

No related papers found

Powered by citation graph analysis