S

Simon D. W. Frost

Microsoft (United States)

ORCID: 0000-0002-5207-9879

Publishes on HIV Research and Treatment, HIV/AIDS Research and Interventions, HIV/AIDS drug development and treatment. 241 papers and 22.9k citations.

241Publications
22.9kTotal Citations

Is this you? Claim your profile.

Add your photo, update your bio, and get notified when your ranking changes.

Top publicationsby citations

HyPhy: hypothesis testing using phylogenies
Cited by 3kOpen Access

UNLABELLED: The HyPhypackage is designed to provide a flexible and unified platform for carrying out likelihood-based analyses on multiple alignments of molecular sequence data, with the emphasis on studies of rates and patterns of sequence evolution. AVAILABILITY: http://www.hyphy.org CONTACT: muse@stat.ncsu.edu SUPPLEMENTARY INFORMATION: HyPhydocumentation and tutorials are available at http://www.hyphy.org.

Not So Different After All: A Comparison of Methods for Detecting Amino Acid Sites Under Selection
Sergei L. Kosakovsky Pond, Simon D. W. Frost|Molecular Biology and Evolution|2005
Cited by 2.4k

We consider three approaches for estimating the rates of nonsynonymous and synonymous changes at each site in a sequence alignment in order to identify sites under positive or negative selection: (1) a suite of fast likelihood-based "counting methods" that employ either a single most likely ancestral reconstruction, weighting across all possible ancestral reconstructions, or sampling from ancestral reconstructions; (2) a random effects likelihood (REL) approach, which models variation in nonsynonymous and synonymous rates across sites according to a predefined distribution, with the selection pressure at an individual site inferred using an empirical Bayes approach; and (3) a fixed effects likelihood (FEL) method that directly estimates nonsynonymous and synonymous substitution rates at each site. All three methods incorporate flexible models of nucleotide substitution bias and variation in both nonsynonymous and synonymous substitution rates across sites, facilitating the comparison between the methods. We demonstrate that the results obtained using these approaches show broad agreement in levels of Type I and Type II error and in estimates of substitution rates. Counting methods are well suited for large alignments, for which there is high power to detect positive and negative selection, but appear to underestimate the substitution rate. A REL approach, which is more computationally intensive than counting methods, has higher power than counting methods to detect selection in data sets of intermediate size but may suffer from higher rates of false positives for small data sets. A FEL approach appears to capture the pattern of rate variation better than counting methods or random effects models, does not suffer from as many false positives as random effects models for data sets comprising few sequences, and can be efficiently parallelized. Our results suggest that previously reported differences between results obtained by counting methods and random effects models arise due to a combination of the conservative nature of counting-based methods, the failure of current random effects models to allow for variation in synonymous substitution rates, and the naive application of random effects models to extremely sparse data sets. We demonstrate our methods on sequence data from the human immunodeficiency virus type 1 env and pol genes and simulated alignments.

Datamonkey: rapid detection of selective pressure on individual sites of codon alignments
Sergei L. Kosakovsky Pond, Simon D. W. Frost|Computer applications in the biosciences|2005
Cited by 1.4kOpen Access

UNLABELLED: Datamonkey is a web interface to a suite of cutting edge maximum likelihood-based tools for identification of sites subject to positive or negative selection. The methods range from very fast data exploration to the some of the most complex models available in public domain software, and are implemented to run in parallel on a cluster of computers. AVAILABILITY: http://www.datamonkey.org. In the future, we plan to expand the collection of available analytic tools, and provide a package for installation on other systems.

Producing polished prokaryotic pangenomes with the Panaroo pipeline
Cited by 1.3kOpen Access

Population-level comparisons of prokaryotic genomes must take into account the substantial differences in gene content resulting from horizontal gene transfer, gene duplication and gene loss. However, the automated annotation of prokaryotic genomes is imperfect, and errors due to fragmented assemblies, contamination, diverse gene families and mis-assemblies accumulate over the population, leading to profound consequences when analysing the set of all genes found in a species. Here, we introduce Panaroo, a graph-based pangenome clustering tool that is able to account for many of the sources of error introduced during the annotation of prokaryotic genome assemblies. Panaroo is available at https://github.com/gtonkinhill/panaroo .

Datamonkey 2010: a suite of phylogenetic analysis tools for evolutionary biology
Wayne Delport, Art F. Y. Poon, Simon D. W. Frost et al.|Bioinformatics|2010
Cited by 1.2kOpen Access

Datamonkey is a popular web-based suite of phylogenetic analysis tools for use in evolutionary biology. Since the original release in 2005, we have expanded the analysis options to include recently developed algorithmic methods for recombination detection, evolutionary fingerprinting of genes, codon model selection, co-evolution between sites, identification of sites, which rapidly escape host-immune pressure and HIV-1 subtype assignment. The traditional selection tools have also been augmented to include recent developments in the field. Here, we summarize the analyses options currently available on Datamonkey, and provide guidelines for their use in evolutionary biology. Availability and documentation: http://www.datamonkey.org.