Model Selection and Model Averaging in Phylogenetics: Advantages of Akaike Information Criterion and Bayesian Approaches Over Likelihood Ratio TestsModel selection is a topic of special relevance in molecular phylogenetics that affects many, if not all, stages of phylogenetic inference. Here we discuss some fundamental concepts and techniques of model selection in the context of phylogenetics. We start by reviewing different aspects of the selection of substitution models in phylogenetics from a theoretical, philosophical and practical point of view, and summarize this comparison in table format. We argue that the most commonly implemented model selection approach, the hierarchical likelihood ratio test, is not the optimal strategy for model selection in phylogenetics, and that approaches like the Akaike Information Criterion (AIC) and Bayesian methods offer important advantages. In particular, the latter two methods are able to simultaneously compare multiple nested or nonnested models, assess model selection uncertainty, and allow for the estimation of phylogenies and model parameters using all available models (model-averaged inference or multimodel inference). We also describe how the relative importance of the different parameters included in substitution models can be depicted. To illustrate some of these points, we have applied AIC-based model averaging to 37 mitochondrial DNA sequences from the subgenus Ohomopterus(genus Carabus) ground beetles described by Sota and Vogler (2001).
Phylogenomics resolves the timing and pattern of insect evolutionInsects are the most speciose group of animals, but the phylogenetic relationships of many major lineages remain unresolved. We inferred the phylogeny of insects from 1478 protein-coding genes. Phylogenomic analyses of nucleotide and amino acid sequences, with site-specific nucleotide or domain-specific amino acid substitution models, produced statistically robust and congruent results resolving previously controversial phylogenetic relations hips. We dated the origin of insects to the Early Ordovician [~479 million years ago (Ma)], of insect flight to the Early Devonian (~406 Ma), of major extant lineages to the Mississippian (~345 Ma), and the major diversification of holometabolous insects to the Early Cretaceous. Our phylogenomic study provides a comprehensive reliable scaffold for future comparative analyses of evolutionary innovations among insects.
Incorporating Molecular Evolution into Phylogenetic Analysis, and a New Compilation of Conserved Polymerase Chain Reaction Primers for Animal Mitochondrial DNAChris Simon, Thomas R. Buckley, Francesco Frati et al.|Annual Review of Ecology Evolution and Systematics|2006 DNA data has been widely used in animal phylogenetic studies over the past 15 years. Here we review how these studies have used advances in knowledge of molecular evolutionary processes to create more realistic models of evolution, evaluate the information content of data, test phylogenetic hypotheses, attach time to phylogenies, and understand the relative usefulness of mitochondrial and nuclear genes. We also provide a new compilation of conserved polymerase chain reaction (PCR) primers for mitochondrial genes that complements our earlier compilation.
Model Misspecification and Probabilistic Tests of Topology: Evidence from Empirical Data SetsThomas R. Buckley|Systematic Biology|2002 Probabilistic tests of topology offer a powerful means of evaluating competing phylogenetic hypotheses. The performance of the nonparametric Shimodaira-Hasegawa (SH) test, the parametric Swofford-Olsen-Waddell-Hillis (SOWH) test, and Bayesian posterior probabilities were explored for five data sets for which all the phylogenetic relationships are known with a very high degree of certainty. These results are consistent with previous simulation studies that have indicated a tendency for the SOWH test to be prone to generating Type 1 errors because of model misspecification coupled with branch length heterogeneity. These results also suggest that the SOWH test may accord overconfidence in the true topology when the null hypothesis is in fact correct. In contrast, the SH test was observed to be much more conservative, even under high substitution rates and branch length heterogeneity. For some of those data sets where the SOWH test proved misleading, the Bayesian posterior probabilities were also misleading. The results of all tests were strongly influenced by the exact substitution model assumptions. Simple models, especially those that assume rate homogeneity among sites, had a higher Type 1 error rate and were more likely to generate misleading posterior probabilities. For some of these data sets, the commonly used substitution models appear to be inadequate for estimating appropriate levels of uncertainty with the SOWH test and Bayesian methods. Reasons for the differences in statistical power between the two maximum likelihood tests are discussed and are contrasted with the Bayesian approach.
The tuatara genome reveals ancient features of amniote evolutionAbstract The tuatara ( Sphenodon punctatus )—the only living member of the reptilian order Rhynchocephalia (Sphenodontia), once widespread across Gondwana 1,2 —is an iconic species that is endemic to New Zealand 2,3 . A key link to the now-extinct stem reptiles (from which dinosaurs, modern reptiles, birds and mammals evolved), the tuatara provides key insights into the ancestral amniotes 2,4 . Here we analyse the genome of the tuatara, which—at approximately 5 Gb—is among the largest of the vertebrate genomes yet assembled. Our analyses of this genome, along with comparisons with other vertebrate genomes, reinforce the uniqueness of the tuatara. Phylogenetic analyses indicate that the tuatara lineage diverged from that of snakes and lizards around 250 million years ago. This lineage also shows moderate rates of molecular evolution, with instances of punctuated evolution. Our genome sequence analysis identifies expansions of proteins, non-protein-coding RNA families and repeat elements, the latter of which show an amalgam of reptilian and mammalian features. The sequencing of the tuatara genome provides a valuable resource for deep comparative analyses of tetrapods, as well as for tuatara biology and conservation. Our study also provides important insights into both the technical challenges and the cultural obligations that are associated with genome sequencing.