PANTHER: A Library of Protein Families and Subfamilies Indexed by FunctionIn the genomic era, one of the fundamental goals is to characterize the function of proteins on a large scale. We describe a method, PANTHER, for relating protein sequence relationships to function relationships in a robust and accurate way. PANTHER is composed of two main components: the PANTHER library (PANTHER/LIB) and the PANTHER index (PANTHER/X). PANTHER/LIB is a collection of "books," each representing a protein family as a multiple sequence alignment, a Hidden Markov Model (HMM), and a family tree. Functional divergence within the family is represented by dividing the tree into subtrees based on shared function, and by subtree HMMs. PANTHER/X is an abbreviated ontology for summarizing and navigating molecular functions and biological processes associated with the families and subfamilies. We apply PANTHER to three areas of active research. First, we report the size and sequence diversity of the families and subfamilies, characterizing the relationship between sequence divergence and functional divergence across a wide range of protein families. Second, we use the PANTHER/X ontology to give a high-level representation of gene function across the human and mouse genomes. Third, we use the family HMMs to rank missense single nucleotide polymorphisms (SNPs), on a database-wide scale, according to their likelihood of affecting protein function.
Wiskott–Aldrich Syndrome Protein, a Novel Effector for the GTPase CDC42Hs, Is Implicated in Actin PolymerizationFrom genes to proteins: High-throughput expression and purification of the human proteomeJoanna S. Albala, Ken Franke, Ian R. McConnell et al.|Journal of Cellular Biochemistry|2000 The development of high-throughput methods for gene discovery has paved the way for the design of new strategies for genome-scale protein analysis. Lawrence Livermore National Laboratory and Onyx Pharmaceuticals, Inc., have produced an automatable system for the expression and purification of large numbers of proteins encoded by cDNA clones from the IMAGE (Integrated Molecular Analysis of Genomes and Their Expression) collection. This high-throughput protein expression system has been developed for the analysis of the human proteome, the protein equivalent of the human genome, comprising the translated products of all expressed genes. Functional and structural analysis of novel genes identified by EST (Expressed Sequence Tag) sequencing and the Human Genome Project will be greatly advanced by the application of this high-throughput expression system for protein production. A prototype was designed to demonstrate the feasibility of our approach. Using a PCR-based strategy, 72 unique IMAGE cDNA clones have been used to create an array of recombinant baculoviruses in a 96-well microtiter plate format. Forty-two percent of these cDNAs successfully produced soluble, recombinant protein. All of the steps in this process, from PCR to protein production, were performed in 96-well microtiter plates, and are thus amenable to automation. Each recombinant protein was engineered to incorporate an epitope tag at the amino terminal end to allow for immunoaffinity purification. Proteins expressed from this system are currently being analyzed for functional and biochemical properties.
Assessment of Utility of ESTs for Nucleotide Diversity Using Available Assembled Alignments from dbESt, STACK 2.0 and STACK-INDEXBrian Karlak, Yoshihide Hayashizaki|Proceedings Genome Informatics Workshop/Genome informatics|1998 Single Nucleotide Polymorphisms (SNPs) in virtual expressed gene fragment alignments represent a potentially signi cant resource for both the detection of non-coding and coding, sequence variations. We have clustered and assembled 767 866 human ESTs into 76 131 alignments localised to speci c tissues [1]. In addition, we have clustered and aligned 300 000 consensus sequences and unclustered ESTs to generate a comprehensive human gene index of over 38 000 unique linked virtual transcripts (STACKINDEX) with associated alignments. The resulting dataset is a potentially rich resource for the detection and characterisation of alternate splicing and polymorphisms. Public access to these data will allow investigators to add functional and scienti c value to the emerging human gene sequences [2]. We have surveyed the dataset and have developed an initial set of criteria for assessment of possible high likelihood SNPs. We have studied the protein p53 as a model for the system.
Refined music clustering