T

Thomas Keane

European Bioinformatics Institute

ORCID: 0000-0001-7532-6898

Publishes on Genomics and Phylogenetic Studies, Genomics and Rare Diseases, Genetic Mapping and Diversity in Plants and Animals. 207 papers and 94.2k citations.

207Publications
94.2kTotal Citations

Is this you? Claim your profile.

Add your photo, update your bio, and get notified when your ranking changes.

Top publicationsby citations

Twelve years of SAMtools and BCFtools
Petr Danecek, James Bonfield, Jennifer Liddle et al.|GigaScience|2021
Cited by 15.5kOpen Access

BACKGROUND: SAMtools and BCFtools are widely used programs for processing and analysing high-throughput sequencing data. They include tools for file format conversion and manipulation, sorting, querying, statistics, variant calling, and effect analysis amongst other methods. FINDINGS: The first version appeared online 12 years ago and has been maintained and further developed ever since, with many new features and improvements added over the years. The SAMtools and BCFtools packages represent a unique collection of tools that have been used in numerous other software projects and countless genomic pipelines. CONCLUSION: Both SAMtools and BCFtools are freely available on GitHub under the permissive MIT licence, free for both non-commercial and commercial use. Both packages have been installed >1 million times via Bioconda. The source code and documentation are available from https://www.htslib.org.

Mouse genomic variation and its effect on phenotypes and gene regulation
Cited by 1.8kOpen Access

We report genome sequences of 17 inbred strains of laboratory mice and identify almost ten times more variants than previously known. We use these genomes to explore the phylogenetic history of the laboratory mouse and to examine the functional consequences of allele-specific variation on transcript abundance, revealing that at least 12% of transcripts show a significant tissue-specific expression bias. By identifying candidate functional variants at 718 quantitative trait loci we show that the molecular nature of functional variants and their position relative to genes vary according to the effect size of the locus. These sequences provide a starting point for a new era in the functional analysis of a key model organism. The laboratory mouse has become the workhorse of biomedical research. The draft sequence of the mouse reference genome was published in 2002, but some forms of variation are still poorly documented. Two papers in this issue go a long way towards filling the gaps. The generation and analysis of sequence from 17 key mouse genomes, including most of the commonly used inbred strains and their progenitors, reveal extensive genetic variation and provide insights into the molecular nature of functional variants as well as the phylogenetic history of the lab mouse. The data will be an important resource for a new era of functional analysis. The second paper describes the landscape of structural variants in the genomes of 13 classical and four wild-derived inbred mouse strains, mapping many of them to base-pair resolution. Despite their prevalence, structural variants are shown to have a relatively small impact on phenotypic variation.

The UK10K project identifies rare variants in health and disease
Cited by 1.2kOpen Access

The contribution of rare and low-frequency variants to human traits is largely unexplored. Here we describe insights from sequencing whole genomes (low read depth, 7×) or exomes (high read depth, 80×) of nearly 10,000 individuals from population-based and disease collections. In extensively phenotyped cohorts we characterize over 24 million novel sequence variants, generate a highly accurate imputation reference panel and identify novel alleles associated with levels of triglycerides (APOB), adiponectin (ADIPOQ) and low-density lipoprotein cholesterol (LDLR and RGAG1) from single-marker and rare variant aggregation tests. We describe population structure and functional annotation of rare and low-frequency variants, use the data to estimate the benefits of sequencing for association studies, and summarize lessons from disease-specific collections. Finally, we make available an extensive resource, including individual-level genetic and phenotypic data and web-based tools to facilitate the exploration of association results. Low read depth sequencing of whole genomes and high read depth exomes of nearly 10,000 extensively phenotyped individuals are combined to help characterize novel sequence variants, generate a highly accurate imputation reference panel and identify novel alleles associated with lipid-related traits; in addition to describing population structure and providing functional annotation of rare and low-frequency variants the authors use the data to estimate the benefits of sequencing for association studies. This paper, combining data and initial findings from the different arms of the UK10K project, describes insights from low-read-depth sequencing of whole genomes or high-read-depth exome sequencing of nearly 10,000 individuals sampled from a range of disease collections, as well as participants from healthy population based cohorts. The authors characterize novel sequence variants, generate a highly accurate imputation reference panel and identify novel alleles associated with lipid-related traits. In addition to describing population structure and providing functional annotation of rare and low frequency variants, they use the data to estimate the benefits of sequencing for association studies.

Assessment of methods for amino acid matrix selection and their use on empirical data shows that ad hoc assumptions for choice of matrix are not justified
Thomas Keane, Christopher J. Creevey, Melissa M. Pentony et al.|BMC Evolutionary Biology|2006
Cited by 1.1kOpen Access

BACKGROUND: In recent years, model based approaches such as maximum likelihood have become the methods of choice for constructing phylogenies. A number of authors have shown the importance of using adequate substitution models in order to produce accurate phylogenies. In the past, many empirical models of amino acid substitution have been derived using a variety of different methods and protein datasets. These matrices are normally used as surrogates, rather than deriving the maximum likelihood model from the dataset being examined. With few exceptions, selection between alternative matrices has been carried out in an ad hoc manner. RESULTS: We start by highlighting the potential dangers of arbitrarily choosing protein models by demonstrating an empirical example where a single alignment can produce two topologically different and strongly supported phylogenies using two different arbitrarily-chosen amino acid substitution models. We demonstrate that in simple simulations, statistical methods of model selection are indeed robust and likely to be useful for protein model selection. We have investigated patterns of amino acid substitution among homologous sequences from the three Domains of life and our results show that no single amino acid matrix is optimal for any of the datasets. Perhaps most interestingly, we demonstrate that for two large datasets derived from the proteobacteria and archaea, one of the most favored models in both datasets is a model that was originally derived from retroviral Pol proteins. CONCLUSION: This demonstrates that choosing protein models based on their source or method of construction may not be appropriate.

A comparative phenotypic and genomic analysis of C57BL/6J and C57BL/6N mouse strains
Cited by 524Open Access

BACKGROUND: The mouse inbred line C57BL/6J is widely used in mouse genetics and its genome has been incorporated into many genetic reference populations. More recently large initiatives such as the International Knockout Mouse Consortium (IKMC) are using the C57BL/6N mouse strain to generate null alleles for all mouse genes. Hence both strains are now widely used in mouse genetics studies. Here we perform a comprehensive genomic and phenotypic analysis of the two strains to identify differences that may influence their underlying genetic mechanisms. RESULTS: We undertake genome sequence comparisons of C57BL/6J and C57BL/6N to identify SNPs, indels and structural variants, with a focus on identifying all coding variants. We annotate 34 SNPs and 2 indels that distinguish C57BL/6J and C57BL/6N coding sequences, as well as 15 structural variants that overlap a gene. In parallel we assess the comparative phenotypes of the two inbred lines utilizing the EMPReSSslim phenotyping pipeline, a broad based assessment encompassing diverse biological systems. We perform additional secondary phenotyping assessments to explore other phenotype domains and to elaborate phenotype differences identified in the primary assessment. We uncover significant phenotypic differences between the two lines, replicated across multiple centers, in a number of physiological, biochemical and behavioral systems. CONCLUSIONS: Comparison of C57BL/6J and C57BL/6N demonstrates a range of phenotypic differences that have the potential to impact upon penetrance and expressivity of mutational effects in these strains. Moreover, the sequence variants we identify provide a set of candidate genes for the phenotypic differences observed between the two strains.