P

Petr Danecek

Wellcome Sanger Institute

ORCID: 0000-0002-4159-1666

Publishes on Genomics and Rare Diseases, Genetic Associations and Epidemiology, Genomic variations and chromosomal abnormalities. 94 papers and 95k citations.

94Publications
95kTotal Citations

Is this you? Claim your profile.

Add your photo, update your bio, and get notified when your ranking changes.

Top publicationsby citations

The variant call format and VCFtools
Petr Danecek, Adam Auton, Gonçalo R. Abecasis et al.|Bioinformatics|2011
Cited by 17.6kOpen Access

SUMMARY: The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. The format was developed for the 1000 Genomes Project, and has also been adopted by other projects such as UK10K, dbSNP and the NHLBI Exome Project. VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API. AVAILABILITY: http://vcftools.sourceforge.net

Twelve years of SAMtools and BCFtools
Petr Danecek, James Bonfield, Jennifer Liddle et al.|GigaScience|2021
Cited by 15.5kOpen Access

BACKGROUND: SAMtools and BCFtools are widely used programs for processing and analysing high-throughput sequencing data. They include tools for file format conversion and manipulation, sorting, querying, statistics, variant calling, and effect analysis amongst other methods. FINDINGS: The first version appeared online 12 years ago and has been maintained and further developed ever since, with many new features and improvements added over the years. The SAMtools and BCFtools packages represent a unique collection of tools that have been used in numerous other software projects and countless genomic pipelines. CONCLUSION: Both SAMtools and BCFtools are freely available on GitHub under the permissive MIT licence, free for both non-commercial and commercial use. Both packages have been installed >1 million times via Bioconda. The source code and documentation are available from https://www.htslib.org.

Mouse genomic variation and its effect on phenotypes and gene regulation
Cited by 1.8kOpen Access

We report genome sequences of 17 inbred strains of laboratory mice and identify almost ten times more variants than previously known. We use these genomes to explore the phylogenetic history of the laboratory mouse and to examine the functional consequences of allele-specific variation on transcript abundance, revealing that at least 12% of transcripts show a significant tissue-specific expression bias. By identifying candidate functional variants at 718 quantitative trait loci we show that the molecular nature of functional variants and their position relative to genes vary according to the effect size of the locus. These sequences provide a starting point for a new era in the functional analysis of a key model organism. The laboratory mouse has become the workhorse of biomedical research. The draft sequence of the mouse reference genome was published in 2002, but some forms of variation are still poorly documented. Two papers in this issue go a long way towards filling the gaps. The generation and analysis of sequence from 17 key mouse genomes, including most of the commonly used inbred strains and their progenitors, reveal extensive genetic variation and provide insights into the molecular nature of functional variants as well as the phylogenetic history of the lab mouse. The data will be an important resource for a new era of functional analysis. The second paper describes the landscape of structural variants in the genomes of 13 classical and four wild-derived inbred mouse strains, mapping many of them to base-pair resolution. Despite their prevalence, structural variants are shown to have a relatively small impact on phenotypic variation.