Biogen (United States)
Publishes on Genomics and Phylogenetic Studies, Biomedical Text Mining and Ontologies, Genetics, Bioinformatics, and Biomedical Research. 21 papers and 4.9k citations.
Add your photo, update your bio, and get notified when your ranking changes.
Meta-analytic methods were used to integrate the findings of a sample of Monte Carlo studies of the robustness of the F test in the one- and two-factor fixed effects ANOVA models. Monte Carlo results for the Welch (1947) and Kruskal-Wallis (Kruskal & Wallis, 1952) tests were also analyzed. The meta-analytic results provided strong support for the robustness of the Type I error rate of the F test when certain assumptions were violated. The F test also showed excellent power properties. However, the Type I error rate of the F test was sensitive to unequal variances, even when sample sizes were equal. The error rate of the Welch test was insensitive to unequal variances when the population distribution was normal, but nonnormal distributions tended to inflate its error rate and to depress its power. Meta-analytic and exact statistical theory results were used to summarize the effects of assumption violations for the tests.
In this report we address the problem of accurate statistical modeling of DNA sequences, either coding or noncoding, for a bacterial species whose genome (or a large portion) was sequenced but not yet characterized experimentally. Availability of these models is critical for successful solution of the genome annotation task by statistical methods of gene finding. We present the method, GeneMark-Genesis, which learns the parameters of Markov models of protein-coding and noncoding regions from anonymous bacterial genomic sequence. These models are subsequently used in the GeneMark and GeneMark.hmm gene-finding programs. Although there is basically one model of a noncoding region for a given genome, several models of protein-coding region are automatically obtained by GeneMark-Genesis. The diversity of protein-coding models reflects the diversity of oligonucleotide compositions, particularly the diversity of codon usage strategies observed in genes from one and the same genome. In the simplest and the most important case, there are just two gene models-typical and atypical ones. We show that the atypical model allows one to predict genes that escape identification by the typical model. Many genes predicted by the atypical model appear to be horizontally transferred genes. The early versions of GeneMark-Genesis were used for annotating the genomes of Methanoccocus jannaschii and Helicobacter pylori. We report the results of accuracy testing of the full-scale version of GeneMark-Genesis on 10 completely sequenced bacterial genomes. Interestingly, the GeneMark.hmm program that employed the typical and atypical models defined by GeneMark-Genesis was able to predict 683 new atypical genes with 176 of them confirmed by similarity search.