C

Christian J. Stoeckert

California University of Pennsylvania

ORCID: 0000-0002-5714-991X

Publishes on Biomedical Text Mining and Ontologies, Bioinformatics and Genomic Networks, Gene expression and cancer classification. 212 papers and 24.1k citations.

212Publications
24.1kTotal Citations

Is this you? Claim your profile.

Add your photo, update your bio, and get notified when your ranking changes.

Top publicationsby citations

OrthoMCL: Identification of Ortholog Groups for Eukaryotic Genomes
Cited by 6.1kOpen Access

The identification of orthologous groups is useful for genome annotation, studies on gene/protein evolution, comparative genomics, and the identification of taxonomically restricted sequences. Methods successfully exploited for prokaryotic genome analysis have proved difficult to apply to eukaryotes, however, as larger genomes may contain multiple paralogous genes, and sequence information is often incomplete. OrthoMCL provides a scalable method for constructing orthologous groups across multiple eukaryotic taxa, using a Markov Cluster algorithm to group (putative) orthologs and paralogs. This method performs similarly to the INPARANOID algorithm when applied to two genomes, but can be extended to cluster orthologs from multiple species. OrthoMCL clusters are coherent with groups identified by EGO, but improved recognition of "recent" paralogs permits overlapping EGO groups representing the same gene to be merged. Comparison with previously assigned EC annotations suggests a high degree of reliability, implying utility for automated eukaryotic genome annotation. OrthoMCL has been applied to the proteome data set from seven publicly available genomes (human, fly, worm, yeast, Arabidopsis, the malaria parasite Plasmodium falciparum, and Escherichia coli). A Web interface allows queries based on individual genes or user-defined phylogenetic patterns (http://www.cbil.upenn.edu/gene-family). Analysis of clusters incorporating P. falciparum genes identifies numerous enzymes that were incompletely annotated in first-pass annotation of the parasite genome.

PlasmoDB: a functional genomic database for malaria parasites
Cristina Aurrecoechea, John Brestelli, Brian P. Brunk et al.|Nucleic Acids Research|2008
Cited by 1.3kOpen Access

PlasmoDB (http://PlasmoDB.org) is a functional genomic database for Plasmodium spp. that provides a resource for data analysis and visualization in a gene-by-gene or genome-wide scale. PlasmoDB belongs to a family of genomic resources that are housed under the EuPathDB (http://EuPathDB.org) Bioinformatics Resource Center (BRC) umbrella. The latest release, PlasmoDB 5.5, contains numerous new data types from several broad categories--annotated genomes, evidence of transcription, proteomics evidence, protein function evidence, population biology and evolution. Data in PlasmoDB can be queried by selecting the data of interest from a query grid or drop down menus. Various results can then be combined with each other on the query history page. Search results can be downloaded with associated functional data and registered users can store their query history for future retrieval or analysis.

TriTrypDB: a functional genomic resource for the Trypanosomatidae
Martin Aslett, Cristina Aurrecoechea, Matthew Berriman et al.|Nucleic Acids Research|2009
Cited by 1kOpen Access

TriTrypDB (http://tritrypdb.org) is an integrated database providing access to genome-scale datasets for kinetoplastid parasites, and supporting a variety of complex queries driven by research and development needs. TriTrypDB is a collaborative project, utilizing the GUS/WDK computational infrastructure developed by the Eukaryotic Pathogen Bioinformatics Resource Center (EuPathDB.org) to integrate genome annotation and analyses from GeneDB and elsewhere with a wide variety of functional genomics datasets made available by members of the global research community, often pre-publication. Currently, TriTrypDB integrates datasets from Leishmania braziliensis, L. infantum, L. major, L. tarentolae, Trypanosoma brucei and T. cruzi. Users may examine individual genes or chromosomal spans in their genomic context, including syntenic alignments with other kinetoplastid organisms. Data within TriTrypDB can be interrogated utilizing a sophisticated search strategy system that enables a user to construct complex queries combining multiple data types. All search strategies are stored, allowing future access and integrated searches. 'User Comments' may be added to any gene page, enhancing available annotation; such comments become immediately searchable via the text search, and are forwarded to curators for incorporation into the reference annotation when appropriate.

Comparative genomics of the neglected human malaria parasite Plasmodium vivax
Cited by 857Open Access

The human malaria parasite Plasmodium vivax is responsible for 25–40% of the ∼515 million annual cases of malaria worldwide. Although seldom fatal, the parasite elicits severe and incapacitating clinical symptoms and often causes relapses months after a primary infection has cleared. Despite its importance as a major human pathogen, P. vivax is little studied because it cannot be propagated continuously in the laboratory except in non-human primates. We sequenced the genome of P. vivax to shed light on its distinctive biological features, and as a means to drive development of new drugs and vaccines. Here we describe the synteny and isochore structure of P. vivax chromosomes, and show that the parasite resembles other malaria parasites in gene content and metabolic potential, but possesses novel gene families and potential alternative invasion pathways not recognized previously. Completion of the P. vivax genome provides the scientific community with a valuable resource that can be used to advance investigation into this neglected species. Four distinct Plasmodium species are known to regularly infect humans: Plasmodium falciparum, P. vivax, P. malariae and P. ovale. The genome sequence of P. falciparum, the cause of the most severe type of human malaria, was completed in 2002 at the same time as the mosquito vector, Anopheles gambiae. In this week's Nature, which focuses on the malaria parasite, two further malaria genome sequences are described. First that of P. vivax, which contributes significant numbers to malaria incidence in humans, though in contrast to P. falciparum, the resulting disease is usually not fatal. The genome of this rather neglected species is presented together with a comparative analysis with the genomes of other Plasmodium species. Second, we publish the genome sequence of Plasmodium knowlesi. For long regarded as a monkey malaria parasite, it is increasingly becoming recognized as the fifth human-infecting Plasmodium species. In particular, it is prevalent in South East Asia where it is often misdiagnosed as another human malaria parasite P. malariae. As a model organism P. knowlesi stands out: not only is it a primate system, useful for work on vaccines, but it can be cultured in vitro and subjected to efficient transfection and gene knockouts. In a Review Article, Elizabeth Winzeler considers the progress made towards using the genome sequence to understand basic malaria parasite biology, and in particular the work on developing rational therapeutic approaches to combat P. falciparum infections. See also the Editorial. For a comprehensive collection of resources visit Nature's past malaria specials: Malaria killer blow ; Outlook on malaria ; Malaria web focus ; Malaria Insight ; Nature Medicine focus on malaria ; Focus on malaria

PPARγ and C/EBP factors orchestrate adipocyte biology via adjacent binding on a genome-wide scale
Martina I. Lefterova, Yong Zhang, David J. Steger et al.|Genes & Development|2008
Cited by 825Open Access

Peroxisome proliferator-activated receptor gamma(PPARgamma), a nuclear receptor and the target of anti-diabetic thiazolinedione drugs, is known as the master regulator of adipocyte biology. Although it regulates hundreds of adipocyte genes, PPARgamma binding to endogenous genes has rarely been demonstrated. Here, utilizing chromatin immunoprecipitation (ChIP) coupled with whole genome tiling arrays, we identified 5299 genomic regions of PPARgamma binding in mouse 3T3-L1 adipocytes. The consensus PPARgamma/RXRalpha "DR-1"-binding motif was found at most of the sites, and ChIP for RXRalpha showed colocalization at nearly all locations tested. Bioinformatics analysis also revealed CCAAT/enhancer-binding protein (C/EBP)-binding motifs in the vicinity of most PPARgamma-binding sites, and genome-wide analysis of C/EBPalpha binding demonstrated that it localized to 3350 of the locations bound by PPARgamma. Importantly, most genes induced in adipogenesis were bound by both PPARgamma and C/EBPalpha, while very few were PPARgamma-specific. C/EBPbeta also plays a role at many of these genes, such that both C/EBPalpha and beta are required along with PPARgamma for robust adipocyte-specific gene expression. Thus, PPARgamma and C/EBP factors cooperatively orchestrate adipocyte biology by adjacent binding on an unanticipated scale.