W

William R. Pearson

University of Virginia

ORCID: 0000-0002-0727-3680

Publishes on Genomics and Phylogenetic Studies, RNA and protein synthesis mechanisms, Machine Learning in Bioinformatics. 140 papers and 29.1k citations.

140Publications
29.1kTotal Citations

Is this you? Claim your profile.

Add your photo, update your bio, and get notified when your ranking changes.

Top publicationsby citations

Improved tools for biological sequence comparison.
William R. Pearson, David J. Lipman|Proceedings of the National Academy of Sciences|1988
Cited by 11.3kOpen Access

We have developed three computer programs for comparisons of protein and DNA sequences. They can be used to search sequence data bases, evaluate similarity scores, and identify periodic structures based on local sequence similarity. The FASTA program is a more sensitive derivative of the FASTP program, which can be used to search protein or DNA sequence data bases and can compare a protein sequence to a DNA sequence data base by translating the DNA data base as it is searched. FASTA includes an additional step in the calculation of the initial pairwise similarity score that allows multiple regions of similarity to be joined to increase the score of related sequences. The RDF2 program can be used to evaluate the significance of similarity scores using a shuffling method that preserves local sequence composition. The LFASTA program can display all the regions of local similarity between two sequences with scores greater than a threshold, using the same scoring parameters and a similar alignment algorithm; these local similarities can be displayed as a "graphic matrix" plot or as individual alignments. In addition, these programs have been generalized to allow comparison of DNA or protein sequences based on a variety of alternative scoring matrices.

Rapid and Sensitive Protein Similarity Searches
Cited by 4.1k

An algorithm was developed which facilitates the search for similarities between newly determined amino acid sequences and sequences already available in databases. Because of the algorithm's efficiency on many microcomputers, sensitive protein database searches may now become a routine procedure for molecular biologists. The method efficiently identifies regions of similar sequence and then scores the aligned identical and differing residues in those regions by means of an amino acid replacability matrix. This matrix increases sensitivity by giving high scores to those amino acid replacements which occur frequently in evolution. The algorithm has been implemented in a computer program designed to search protein databases very rapidly. For example, comparison of a 200-amino-acid sequence to the 500,000 residues in the National Biomedical Research Foundation library would take less than 2 minutes on a minicomputer, and less than 10 minutes on a microcomputer (IBM PC).

An Introduction to Sequence Similarity (“Homology”) Searching
William R. Pearson|Current Protocols in Bioinformatics|2013
Cited by 946Open Access

Sequence similarity searching, typically with BLAST, is the most widely used and most reliable strategy for characterizing newly determined sequences. Sequence similarity searches can identify "homologous" proteins or genes by detecting excess similarity- statistically significant similarity that reflects common ancestry. This unit provides an overview of the inference of homology from significant similarity, and introduces other units in this chapter that provide more details on effective strategies for identifying homologs.

Hereditary differences in the expression of the human glutathione transferase active on trans-stilbene oxide are due to a gene deletion.
Janeric Seidegård, William R. Vorachek, R W Pero et al.|Proceedings of the National Academy of Sciences|1988
Cited by 732Open Access

Glutathione transferase (GT; EC 2.5.1.18) mRNA levels were measured in human liver samples by using mouse and human cDNA clones that encode class-mu and class-alpha GT. Although all the RNA samples examined contained class-alpha GT mRNA, class-mu GT mRNA was found only in individuals whose peripheral leukocytes expressed GT activity on the substrate trans-stilbene oxide. The mouse class-mu cDNA clone was used to identify a human class-mu GT cDNA clone, lambda GTH411. The amino acid sequence of the GT encoded by lambda GTH411 is identical with the 23 residues determined for the human liver GT-mu isoenzyme and shares 76-81% identity with mouse and rat class-mu GT isoenzymes. The mouse and human class-mu GT cDNA inserts hybridize with multiple BamHI and EcoRI restriction fragments in the human genome. One of these hybridizing fragments is missing in the DNA of individuals who lack GT activity on trans-stilbene oxide. Hybridizations with nonoverlapping subfragments of lambda GTH411 suggest that there are at least three class-mu genes in the human genome. One of these genes appears to be deleted in individuals lacking GT activity on trans-stilbene oxide.