M

Michael Gribskov

Purdue University West Lafayette

ORCID: 0000-0002-1718-0242

Publishes on Genomics and Phylogenetic Studies, RNA and protein synthesis mechanisms, Machine Learning in Bioinformatics. 165 papers and 20.6k citations.

165Publications
20.6kTotal Citations

Is this you? Claim your profile.

Add your photo, update your bio, and get notified when your ranking changes.

Top publicationsby citations

The Genome of Black Cottonwood, <i>Populus trichocarpa</i> (Torr. &amp; Gray)
Cited by 4.4kOpen Access

We report the draft genome of the black cottonwood tree, Populus trichocarpa. Integration of shotgun sequence assembly with genetic mapping enabled chromosome-scale reconstruction of the genome. More than 45,000 putative protein-coding genes were identified. Analysis of the assembled genome revealed a whole-genome duplication event; about 8000 pairs of duplicated genes from that event survived in the Populus genome. A second, older duplication event is indistinguishably coincident with the divergence of the Populus and Arabidopsis lineages. Nucleotide substitution, tandem gene duplication, and gross chromosomal rearrangement appear to proceed substantially more slowly in Populus than in Arabidopsis. Populus has more protein-coding genes than Arabidopsis, ranging on average from 1.4 to 1.6 putative Populus homologs for each Arabidopsis gene. However, the relative frequency of protein domains in the two genomes is similar. Overrepresented exceptions in Populus include genes associated with lignocellulosic wall biosynthesis, meristem development, disease resistance, and metabolite transport.

Profile analysis: detection of distantly related proteins.
Michael Gribskov, A. McLachlan, David Eisenberg|Proceedings of the National Academy of Sciences|1987
Cited by 1.3kOpen Access

Profile analysis is a method for detecting distantly related proteins by sequence comparison. The basis for comparison is not only the customary Dayhoff mutational-distance matrix but also the results of structural studies and information implicit in the alignments of the sequences of families of similar proteins. This information is expressed in a position-specific scoring table (profile), which is created from a group of sequences previously aligned by structural or sequence similarity. The similarity of any other sequence (target) to the group of aligned sequences (probe) can be tested by comparing the target to the profile using dynamic programming algorithms. The profile method differs in two major respects from methods of sequence comparison in common use: (i) Any number of known sequences can be used to construct the profile, allowing more information to be used in the testing of the target than is possible with pairwise alignment methods. (ii) The profile includes the penalties for insertion or deletion at each position, which allow one to include the probe secondary structure in the testing scheme. Tests with globin and immunoglobulin sequences show that profile analysis can distinguish all members of these families from all other sequences in a database containing 3800 protein sequences.

Combining evidence using p-values: application to sequence homology searches.
Trisha L. Bailey, Michael Gribskov|Bioinformatics|1998
Cited by 1.3kOpen Access

MOTIVATION: To illustrate an intuitive and statistically valid method for combining independent sources of evidence that yields a p-value for the complete evidence, and to apply it to the problem of detecting simultaneous matches to multiple patterns in sequence homology searches. RESULTS: In sequence analysis, two or more (approximately) independent measures of the membership of a sequence (or sequence region) in some class are often available. We would like to estimate the likelihood of the sequence being a member of the class in view of all the available evidence. An example is estimating the significance of the observed match of a macromolecular sequence (DNA or protein) to a set of patterns (motifs) that characterize a biological sequence family. An intuitive way to do this is to express each piece of evidence as a p-value, and then use the product of these p-values as the measure of membership in the family. We derive a formula and algorithm (QFAST) for calculating the statistical distribution of the product of n independent p-values. We demonstrate that sorting sequences by this p-value effectively combines the information present in multiple motifs, leading to highly accurate and sensitive sequence homology searches.

Phylogenetic Relationships within Cation Transporter Families of Arabidopsis
Cited by 1.3kOpen Access

Uptake and translocation of cationic nutrients play essential roles in physiological processes including plant growth, nutrition, signal transduction, and development. Approximately 5% of the Arabidopsis genome appears to encode membrane transport proteins. These proteins are classified in 46 unique families containing approximately 880 members. In addition, several hundred putative transporters have not yet been assigned to families. In this paper, we have analyzed the phylogenetic relationships of over 150 cation transport proteins. This analysis has focused on cation transporter gene families for which initial characterizations have been achieved for individual members, including potassium transporters and channels, sodium transporters, calcium antiporters, cyclic nucleotide-gated channels, cation diffusion facilitator proteins, natural resistance-associated macrophage proteins (NRAMP), and Zn-regulated transporter Fe-regulated transporter-like proteins. Phylogenetic trees of each family define the evolutionary relationships of the members to each other. These families contain numerous members, indicating diverse functions in vivo. Closely related isoforms and separate subfamilies exist within many of these gene families, indicating possible redundancies and specialized functions. To facilitate their further study, the PlantsT database (http://plantst.sdsc.edu) has been created that includes alignments of the analyzed cation transporters and their chromosomal locations.

The Arabidopsis CDPK-SnRK Superfamily of Protein Kinases
Cited by 1.1kOpen Access

The CDPK-SnRK superfamily consists of seven types of serine-threonine protein kinases: calcium-dependent protein kinase (CDPKs), CDPK-related kinases (CRKs), phosphoenolpyruvate carboxylase kinases (PPCKs), PEP carboxylase kinase-related kinases (PEPRKs), calmodulin-dependent protein kinases (CaMKs), calcium and calmodulin-dependent protein kinases (CCaMKs), and SnRKs. Within this superfamily, individual isoforms and subfamilies contain distinct regulatory domains, subcellular targeting information, and substrate specificities. Our analysis of the Arabidopsis genome identified 34 CDPKs, eight CRKs, two PPCKs, two PEPRKs, and 38 SnRKs. No definitive examples were found for a CCaMK similar to those previously identified in lily (Lilium longiflorum) and tobacco (Nicotiana tabacum) or for a CaMK similar to those in animals or yeast. CDPKs are present in plants and a specific subgroup of protists, but CRKs, PPCKs, PEPRKs, and two of the SnRK subgroups have been found only in plants. CDPKs and at least one SnRK have been implicated in decoding calcium signals in Arabidopsis. Analysis of intron placements supports the hypothesis that CDPKs, CRKs, PPCKs and PEPRKs have a common evolutionary origin; however there are no conserved intron positions between these kinases and the SnRK subgroup. CDPKs and SnRKs are found on all five Arabidopsis chromosomes. The presence of closely related kinases in regions of the genome known to have arisen by genome duplication indicates that these kinases probably arose by divergence from common ancestors. The PlantsP database provides a resource of continuously updated information on protein kinases from Arabidopsis and other plants.